The Summer 2018 workshops are over. We’ll be back next summer with more workshops! See the Other Options for Learning page for resources available during the academic year.
Registration is required for all workshops; see links below. You’re welcome to attend one workshop or many, but you must register individually for each workshop. Please join the waitlist if a workshop is full, as there may be cancellations or opportunities for additional sessions.
There is a small registration fee ($5-$20) for all workshops to promote attendance and help cover the cost of light refreshments (coffee/tea and snacks) during the workshops. If the fee would prevent you from attending, please contact us for options.
A NetID is required to register. Workshops are open to Northwestern graduate students, postdocs, faculty, and staff. Workshop instructors, helpers, and participants are expected to adhere to the Code of Conduct.
All workshops require that you bring a laptop with a wifi connection, charged battery, and the necessary software installed ahead of the workshop. Workshop instructors and helpers will NOT be able to help with software installation during the workshop. See Software Installation for instructions.
Workshop materials will be publicly available online; they will be listed on the Workshop Materials page when available.
Trying to decide which workshops to take?
If you don’t know Python or R, we recommend you pick one language to learn well this summer. If your work primarily involves statistical analysis, exploratory data analysis, or data visualization, or you’re in a field with established R packages for analysis, you may want to start with R. If your work involves working with text data, collecting data from the web, working with numerical matrices, writing scripts, automating workflows, or building applications, you may want to start with Python. But, both languages ultimately have similar capabilities and are useful for data science and data analysis tasks.
The command line, databases, and data strategies workshops teach useful skills for both R and Python users.
The workshop descriptions include information on what previous experience participants are expected to have. Most Python and R workshops beyond the introductory ones expect participants will know the skills from the introductory workshops. However, you do not necessarily have to take the introductory workshops to acquire these skills. Your previous experience may be sufficient, or you may use online resources to learn the basics. Even if you don’t register for an introductory workshop, you are welcome to work through the workshop materials and exercises on your own as a refresher, or attend the afternoon practice sessions to work with others.
Other Options
If you’d like to learn Python in a concentrated 2-week course, consider signing up for NICO 101 instead of the summer workshops. The course is open to folks other than undergraduates; contact Prof. Pah for details.
Faculty: please register for any workshops you’re interested in. In addition, Research Computing Services is exploring the possibility of a faculty only bootcamp in September that would mix tutorials with work on your own projects. If you’re interested in this, please let us know.
Workshops by Topic
Looking for a workshop on a topic not listed here? Let us know.
Python
- Intro to Programming with Python – Evanston
- Python: Data Visualization – Evanston
- Python: Pandas – Evanston
- Python: Pandas – Chicago
- Python: NumPy and SciPy – Evanston
- Python: NumPy and SciPy – Chicago
- Python: APIs and Web Requests – Evanston
- Python: Scikit-learn – Evanston
- Python: Preparing Text as Data – Evanston
R
- R: Introduction – Evanston
- R: Intro to Shiny – Evanston
- R: Refresher – Chicago
- R: Intro to the Tidyverse – Evanston
- R: Visualization with ggplot2 – Evanston
- R: Visualization with ggplot2 – Chicago
- R: Gene Expression Analysis – Chicago
- R: Statistical Modeling and Learning – Evanston
- R: Statistical Models and Machine Learning – Chicago
Other
- Intro to the Command Line – Evanston
- Intro to Databases and SQL – Evanston
- GIS: Introduction – Evanston
- Data Strategies – Evanston
Workshop Calendar
- Evanston Workshops: purple in the calendar
- Chicago Workshops: blue in the calendar
June 2018
Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|
25 Intro to Programming with Python (1/3) |
26 R: Introduction (1/2) |
27 Intro to Programming with Python (2/3) |
28 R: Introduction (2/2) |
29 Intro to Programming with Python (3/3) |
July 2018
August 2018
Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|
1 | 2 | 3 Python: Machine Learning with Scikit-learn |
||
6 | 7 R: Gene Expression Analysis |
8 Intro to ArcGIS |
9 Data Strategies |
10 |
13 | 14 | 15 | 16 Python: Preparing Text as Data |
17 |
Workshop Descriptions
Evanston
Workshop practice session locations subject to change.
Date | Description |
---|---|
June 25, 27, 29 | Intro to Programming with Python If you’re new to programming in any language, this workshop is for you. Over three sessions, we’ll cover the basics of writing code, using Python as the language. Participants are expected to attend all three days. This workshop is for those with limited prior programming experience and will be paced accordingly. If you already know how to program, and you’re looking to learn Python, see the Resources page for alternative courses and tutorials.Things you’ll learn in this workshop:
Prerequisites: None Time and Location: 9am-12pm, ITW Classroom, 1-350, Ford Motor Company Engineering Design Center Practice Session: 1:30pm, Mudd Library, Large Classroom, Room 2210 |
June 26, 28 | R: Introduction R is a programming language and environment designed for statistical computing. If you do statistical analysis, data visualization, or data manipulation, it’s a powerful tool that is used by academics and data scientists alike. It supports reproducible workflows and helps you take analysis beyond using built-in models on relatively clean data sets as you might with other statistical analysis programs. If you’ve never used R before, or you are looking for a review of the basics, start here. This is a two session workshop, and participants are expected to attend both days. Participants are encouraged to continue learning R with R: Intro to the Tidyverse, which teaches packages that make data manipulation tasks easier, and R: Visualization with ggplot2 to learn more about data visualization in R. Together, these workshops will give you a well-rounded introduction to R.Things you’ll learn in this workshop:
Prerequisites: None Time and Location: 9am-12pm, 555 Clark St., B01 Practice Session: 1:30pm, Mudd Library, Large Classroom, Room 2210 |
July 6 | R: Intro to Shiny Shiny is an R package for making interactive visualizations and web applications with R. Tell stories with your data, and let your users interact with your data and analysis. Create dashboards, reports, or stand alone websites.Things you’ll learn in this workshop:
Prerequisites: Familiarity with R, including making plots Time and Location: 9am-12pm, Mudd Library, Small Classroom, Room 2124 Practice Session: None |
July 9 | Python: Data Visualization If you’re analyzing data with Python, then you need to be able to visualize your data as well. You can plot pandas data frames directly, but for certain chart types, formats, and options, you need to use the underlying matplotlib library directly. Learning matplotlib will help you with creating other specialized data visualizations in Python as well, as most Python data visualization libraries are based on it. We’ll look at Seaborn, a library for statistical data visualization, as one example of a specialized plotting library.Things you’ll learn in this workshop:
Prerequisites: Familiarity with Python and Jupyter Notebooks Time and Location: 9am-12pm, Chambers Hall, Lower Level Practice Session: 1:30pm, Main Library, B183 |
July 10 | Python: Pandas Do you work with data of different types (numerical, categorical, text, dates)? Does your data have labelled variables or observations? Pandas DataFrames are the way to work with mixed and labeled data in Python. Pandas, the Python data analysis library, lets you perform common data analysis tasks such as subsetting and aggregating data, creating new variables, recoding data, computing summary statistics, and visualizing your data.Things you’ll learn in this workshop:
Prerequisites: Familiarity with Python and Jupyter Notebooks Time and Location: 9am-12pm, Chambers Hall, Lower Level Practice Session: 1:30pm, Main Library, B183 This workshop is also offered on the Chicago campus with the same content. Participants may also be interested in the Python: Data Visualization workshop to go beyond the few plotting basics included in this workshop. |
July 11 | Intro to the Command Line Using the command line (aka Unix shell, or terminal) is fundamental to using both your computer and more powerful cloud and cluster computational resources effectively. Whether installing and managing Python packages, running Python or R scripts, interacting with databases, using version control systems like git, running machine learning models in the cloud, or using Quest (Northwestern’s high performance supercomputer), being comfortable with the command line will open up new computational possibilities. Data scientists and researchers alike will work more efficiently with command line skills in their toolkit.Things you’ll learn in this workshop:
Prerequisites: None Time and Location: 9am-12pm, Chambers Hall, Lower Level Practice Session: 1:30pm, Mudd Library, Small Classroom, Room 2124 |
July 16, 18 | R: Intro to the Tidyverse The tidyverse is a set of R packages designed to make data manipulation and analysis tasks easier and more consistent. Whether you’re brand new to R or have been using R for years, the tidyverse packages can help you improve your workflow. This workshop will build off of the R for Data Science book by Hadley Wickham and Garrett Grolemund. This is a two session workshop; participants are expected to attend both days. Workshop participants may also be interested in R: Visualization with ggplot2 to learn the ggplot2 package in more depth than what will be covered in this workshop.Things you’ll learn in this workshop:
Prerequisites: This workshop is suitable for beginners, but it’s recommended that you have experience with R at least at the level of the Introduction to R workshop. Time and Location: 9am-12pm, Chambers Hall, Lower Level Practice Session: 1:30pm, Mudd Library, Small Classroom, Room 2124 |
July 20 | Python: NumPy and SciPy Numpy and SciPy are the core libraries for scientific computing with Python. If you want to work with matrices in Python, these are the packages for you. NumPy and SciPy are also the foundation of higher level data science and machine learning packages. If you plan to use packages like Tensorflow or Scikit-learn for machine learning, understanding these packages will help you shape and work with data for both input and output.Things you’ll learn in this workshop:
Prerequisites: Familiarity with Python and Jupyter Notebooks Time and Location: 9am-12pm, Chambers Hall, Lower Level Practice Session: CANCELLED This workshop is also offered on the Chicago campus. |
July 24 | R: Visualization with ggplot2 ggplot2 is one of the most popular R packages – for good reason. It’s a data visualization framework that logically associates features of your data with elements of the plot. Once you’ve learned the basic ggplot2 syntax for making a plot, you can adapt it to make new visualizations with different plot types easily. ggplot2 makes plotting multiple data series with different lines or marker types straightforward, and it lets you create multiple plots for different groups in your data with one command. Whether you’re creating data visualizations for data exploration or publication, come learn why The New York Times and FiveThirtyEight both use ggplot2.Things you’ll learn in this workshop:
Prerequisites: Experience with R of at least the level of the Introduction to R workshop. Time and Location: 9am-12pm, ITW Classroom, 1-350 Ford Motor Company Engineering Design Center Practice Session: 1:30pm, Mudd Library, Large Classroom, Room 2210 This workshop is also offered on the Chicago campus with the same content. |
July 25 | R: Statistical Modeling and Learning You’ve learned the basics of R, but now you need to know how to actually do your analysis. If that might include running some type of regression model, this is the workshop for you. Whether you want to use regression for statistical inference or prediction, this workshop will cover the basics to help you get started.Things you’ll learn in this workshop:
Prerequisites:
Time and Location: 9am-12pm, Mudd Library, Large Classroom, Room 2210 Practice Session: 1:30pm, Mudd Library, Large Classroom, Room 2210 A similar workshop is also offered on the Chicago campus, but with content tailored to life sciences researchers. |
July 27 | Python: APIs and Web Requests If you need to collect data from a website or public platform, you’ll need to use APIs (application programming interfaces) or web requests. In this workshop, you’ll learn the basics of both paradigms, and we’ll cover some basic scraping etiquette.Things you’ll learn in this workshop:
Prerequisites: Familiarity with Python and Jupyter Notebooks Time and Location: 9am-12pm, Chambers Hall, Lower Level Practice Session: 1:30pm, location tbd |
July 30 | Intro to Databases and SQL NOTE: This is an all-day workshop. Lunch will be provided.Have you heard about databases but aren’t really sure what they are? Is your data growing beyond what you can manage in Excel? Do you need to learn SQL to access databases you want to use? Have you heard that data scientists use SQL and want to learn more? This workshop will cover the basics of database design and show you how to use SQL to work with relational databases. Things you’ll learn in this workshop:
Prerequisites: None Time and Location: 9am-4pm, Chambers Hall, Lower Level Practice Session: None |
August 3 | Python: Scikit-learn Scikit-learn is the core machine learning library for Python. It allows you to run a wide range of classification, clustering, regression, and prediction algorithms all using the same framework. It includes tools for splitting your data into test and training sets, parameter estimation using grid search and cross validation, evaluating your models, and making predictions. It works with NumPy and SciPy. If you want to use Python for predictive modeling, Scikit-learn is the place to start.Things you’ll learn in this workshop:
Prerequisites:
Time and Location: 9am-12pm, Chambers Hall, Lower Level Practice Session: 1:30pm, location tbd |
August 8 | GIS: Introduction A geographic information system (GIS) is a framework for gathering, managing, and analyzing data. It is based in the science of geography and integrates many types of data. It analyzes spatial location and organizes layers of information into visualizations using maps. With this unique capability, GIS can help you gain insights into patterns and relationships in your data that you may miss using other statistical programs. If you’ve never used GIS before, or you are looking for a review of the basics, this is a great place to start.Things you’ll learn in this workshop:
Prerequisites: None Time and Location: 9am-12pm, Mudd Library, Large Classroom, Room 2210 Practice Session: The library GIS team will be available in Mudd Library in the afternoon if you have additional questions or want to work on your own projects. Note: Computers with the necessary software will be provided for this workshop. You do not need to install anything ahead of time. |
August 9 | Data Strategies You’ve taken some of the other workshops this summer and learned new skills. But how exactly are you going to apply those skills to your data and research? This workshop will use examples of real data problems members of the community face to demonstrate how to put your Python, R, or other skills to use on real problems. Have an idea or an example of a challenge you’re facing in your own work? Email Thomas Stoeger with your questions, ideas, or (small or large) data strategy problems you have encountered.This workshop is language agnostic and will be beneficial to folks using R or Python (or other languages). Prerequisites: Basic data analysis or programming Time and Location: 9am-12pm, Chambers Hall, Lower Level Practice Session: None |
August 16 | Python: Preparing Text as Data You’ve collected or received your text data and need to clean them for analysis. In this workshop we’ll go over the types of cleaning you might need to do given your research question, and how to do it.Things you’ll learn in this workshop:
Prerequisites: Familiarity with Python and Jupyter Notebooks Time and Location: 9am-12pm, Chambers Hall, Lower Level Practice Session: 1:30pm, location tbd |
Chicago
Date | Description |
---|---|
July 12 | Python: Pandas Do you work with data of different types (numerical, categorical, text, dates)? Does your data have labelled variables or observations? Pandas DataFrames are the way to work with mixed and labeled data in Python. Pandas, the Python data analysis library, let’s you perform common data analysis tasks such as subsetting and aggregating data, creating new variables, recoding data, computing summary statistics, and visualizing your data.Things you’ll learn in this workshop:
Prerequisites: Familiarity with Python and Jupyter Notebooks Time and Location: 9am-12pm, Galter Health Sciences Library & Learning Center, Large Classroom. Note: food and uncovered drinks are not permitted in Galter Library, so no refreshments will be provided for this workshop. Practice Session: 1:30pm, Galter Health Sciences Library & Learning Center, meet by the Large Classroom Participants may also be interested in the Python: Data Visualization workshop on the Evanston campus to go beyond the few plotting basics included in this workshop. |
July 17 | R: Refresher Has it been a while since you used R? Maybe you used it for a project or course, but never really learned the basics? Are you familiar with data analysis and scripting and looking for a fast-paced introduction? This workshop will help you remember what you may have once learned and fill in the gaps to get you working with R again.Things you’ll (re)learn in this workshop:
Prerequisites: You’ve used R at least once. Those completely new to R should consider completing DataCamp’s free Introduction to R course online before this workshop, or take the Introduction to R workshop offered on the Evanston campus instead. Time and Location: 9am-12pm, Galter Health Sciences Library & Learning Center, Large Classroom. Note: food and uncovered drinks are not permitted in Galter Library, so no refreshments will be provided for this workshop. Practice Session: None |
July 17 | R: Visualization with ggplot2 NOTE: This is an afternoon workshop.ggplot2 is one of the most popular R packages $endash; for good reason. It’s a data visualization framework that logically associates features of your data with elements of the plot. Once you’ve learned the basic ggplot2 syntax for making a plot, you can adapt it to make new visualizations with different plot types easily. ggplot2 makes plotting multiple data series with different lines or marker types straightforward, and it lets you create multiple plots for different groups in your data with one command. Whether you’re creating data visualizations for data exploration or publication, come learn why The New York Times and FiveThirtyEight both use ggplot2.Things you’ll learn in this workshop:
Prerequisites: Experience with R of at least the level of the R: Refresher workshop or Introduction to R workshop. Time and Location: 1pm-4pm, Galter Health Sciences Library & Learning Center, Large Classroom. Note: food and uncovered drinks are not permitted in Galter Library, so no refreshments will be provided for this workshop. Practice Session: None The content of this session is the same as that of the ggplot2 workshop on the Evanston campus. |
July 19 | Python: NumPy and SciPy Numpy and SciPy are the core libraries for scientific computing with Python. If you want to work with matrices in Python, these are the packages for you. NumPy and SciPy are also the foundation of higher level data science and machine learning packages. If you plan to use packages like Tensorflow or Scikit-learn for machine learning, understanding these packages will help you shape and work with data for both input and output.Things you’ll learn in this workshop:
Prerequisites: Familiarity with Python and Jupyter Notebooks Time and Location: 9am-12pm, Galter Health Sciences Library & Learning Center, Large Classroom. Note: food and uncovered drinks are not permitted in Galter Library, so no refreshments will be provided for this workshop. Practice Session: CANCELLED The topics covered in this session are the same as those covered in the NumPy and SciPy workshop on the Evanston campus. |
July 26 | R: Statistical and Machine Learning Models Ready to use R to analyze your data? If you’ll be using regression or classification models, this workshop will help you get started.Things you’ll learn in this workshop:
Prerequisites:
Time and Location: 9am-12pm, Wieboldt Hall, Room 506 Practice Session: 1:30pm, Galter Health Sciences Library & Learning Center, meet by the Large Classroom |
August 7 | R: Gene Expression Analysis In this workshop, you’ll learn how to analyze RNA-seq gene expression data to identify candidate genes and pathways using R.Things you’ll learn in this workshop:
Prerequisites:
Time and Location: 9am-12pm, Robert H. Lurie Medical Research Center, Baldwin Auditorium Practice Session: 1:30pm, Galter Health Sciences Library & Learning Center, meet by the Large Classroom |