Skip to main content

Workshop Schedule

The Summer 2018 workshops are over. We’ll be back next summer with more workshops! See the Other Options for Learning page for resources available during the academic year.

 

Registration is required for all workshops; see links below. You’re welcome to attend one workshop or many, but you must register individually for each workshop. Please join the waitlist if a workshop is full, as there may be cancellations or opportunities for additional sessions.

There is a small registration fee ($5-$20) for all workshops to promote attendance and help cover the cost of light refreshments (coffee/tea and snacks) during the workshops. If the fee would prevent you from attending, please contact us for options.

A NetID is required to register. Workshops are open to Northwestern graduate students, postdocs, faculty, and staff. Workshop instructors, helpers, and participants are expected to adhere to the Code of Conduct.

All workshops require that you bring a laptop with a wifi connection, charged battery, and the necessary software installed ahead of the workshop. Workshop instructors and helpers will NOT be able to help with software installation during the workshop. See Software Installation for instructions.

Workshop materials will be publicly available online; they will be listed on the Workshop Materials page when available.

Trying to decide which workshops to take?

If you don’t know Python or R, we recommend you pick one language to learn well this summer. If your work primarily involves statistical analysis, exploratory data analysis, or data visualization, or you’re in a field with established R packages for analysis, you may want to start with R. If your work involves working with text data, collecting data from the web, working with numerical matrices, writing scripts, automating workflows, or building applications, you may want to start with Python. But, both languages ultimately have similar capabilities and are useful for data science and data analysis tasks.

The command line, databases, and data strategies workshops teach useful skills for both R and Python users.

The workshop descriptions include information on what previous experience participants are expected to have. Most Python and R workshops beyond the introductory ones expect participants will know the skills from the introductory workshops. However, you do not necessarily have to take the introductory workshops to acquire these skills. Your previous experience may be sufficient, or you may use online resources to learn the basics. Even if you don’t register for an introductory workshop, you are welcome to work through the workshop materials and exercises on your own as a refresher, or attend the afternoon practice sessions to work with others.

Other Options

If you’d like to learn Python in a concentrated 2-week course, consider signing up for NICO 101 instead of the summer workshops. The course is open to folks other than undergraduates; contact Prof. Pah for details.

Faculty: please register for any workshops you’re interested in. In addition, Research Computing Services is exploring the possibility of a faculty only bootcamp in September that would mix tutorials with work on your own projects. If you’re interested in this, please let us know.

Workshops by Topic

Looking for a workshop on a topic not listed here? Let us know.

Python

R

Other

Workshop Calendar

June 2018

Mon Tue Wed Thu Fri
25
Intro to Programming with Python (1/3)
26
R: Introduction (1/2)
27
Intro to Programming with Python (2/3)
28
R: Introduction (2/2)
29
Intro to Programming with Python (3/3)

July 2018

Mon Tue Wed Thu Fri
2 3 4 5 6
R: Intro to Shiny
9
Python: Data Visualization
10
Python: Pandas
11
Intro to Command Line
12
Python: Pandas
13
16
R: Intro to the Tidyverse (1/2)
17
R: Refresher (am)
R: Visualization with ggplot2 (pm)
18
R: Intro to the Tidyverse (2/2)
19
Python: NumPy and SciPy
20
Python: NumPy and SciPy
23 24
R: Visualization with ggplot2
25
R: Statistical Modeling and Learning
26
R: Statistical Models
27
Python: APIs and Web Requests
30
Intro to Databases and SQL
31      

August 2018

Mon Tue Wed Thu Fri
    1 2 3
Python: Machine Learning with Scikit-learn
6 7
R: Gene Expression Analysis
8
Intro to ArcGIS
9
Data Strategies
10
13 14 15 16
Python: Preparing Text as Data
17

Workshop Descriptions

Skip to Chicago

Evanston

Workshop practice session locations subject to change.

Date Description
June 25, 27, 29

REGISTER

Intro to Programming with Python
If you’re new to programming in any language, this workshop is for you. Over three sessions, we’ll cover the basics of writing code, using Python as the language. Participants are expected to attend all three days. This workshop is for those with limited prior programming experience and will be paced accordingly. If you already know how to program, and you’re looking to learn Python, see the Resources page for alternative courses and tutorials.Things you’ll learn in this workshop:

  • Create variables and objects
  • Read and write files
  • Use conditional statements and loops
  • Write functions
  • Import and use packages
  • Run your code in Jupyter Notebooks or as a script
  • Read documentation and get more help

Prerequisites: None

Time and Location: 9am-12pm, ITW Classroom, 1-350, Ford Motor Company Engineering Design Center

Practice Session: 1:30pm, Mudd Library, Large Classroom, Room 2210

June 26, 28

REGISTER

R: Introduction
R is a programming language and environment designed for statistical computing. If you do statistical analysis, data visualization, or data manipulation, it’s a powerful tool that is used by academics and data scientists alike. It supports reproducible workflows and helps you take analysis beyond using built-in models on relatively clean data sets as you might with other statistical analysis programs. If you’ve never used R before, or you are looking for a review of the basics, start here. This is a two session workshop, and participants are expected to attend both days. Participants are encouraged to continue learning R with R: Intro to the Tidyverse, which teaches packages that make data manipulation tasks easier, and R: Visualization with ggplot2 to learn more about data visualization in R. Together, these workshops will give you a well-rounded introduction to R.Things you’ll learn in this workshop:

  • How to use RStudio and projects
  • Import and manipulate data
  • Basic plotting
  • How to write an R script
  • Aggregating and summarizing data
  • Import and use packages
  • Read documentation and get more help

Prerequisites: None

Time and Location: 9am-12pm, 555 Clark St., B01

Practice Session: 1:30pm, Mudd Library, Large Classroom, Room 2210

July 6

REGISTER

R: Intro to Shiny
Shiny is an R package for making interactive visualizations and web applications with R. Tell stories with your data, and let your users interact with your data and analysis. Create dashboards, reports, or stand alone websites.Things you’ll learn in this workshop:

  • Create and share a Shiny application
  • Change data visualizations in response to user input
  • Customize the appearance of your application
  • Add text and multiple panels to your application

Prerequisites: Familiarity with R, including making plots

Time and Location: 9am-12pm, Mudd Library, Small Classroom, Room 2124

Practice Session: None

July 9

REGISTER

Python: Data Visualization
If you’re analyzing data with Python, then you need to be able to visualize your data as well. You can plot pandas data frames directly, but for certain chart types, formats, and options, you need to use the underlying matplotlib library directly. Learning matplotlib will help you with creating other specialized data visualizations in Python as well, as most Python data visualization libraries are based on it. We’ll look at Seaborn, a library for statistical data visualization, as one example of a specialized plotting library.Things you’ll learn in this workshop:

  • Read data into a pandas data frame for plotting
  • Plotting with matplotlib
  • How to create several different types of plots
  • Statistical data visualizations with Seaborn

Prerequisites: Familiarity with Python and Jupyter Notebooks

Time and Location: 9am-12pm, Chambers Hall, Lower Level

Practice Session: 1:30pm, Main Library, B183

July 10

REGISTER

Python: Pandas
Do you work with data of different types (numerical, categorical, text, dates)? Does your data have labelled variables or observations? Pandas DataFrames are the way to work with mixed and labeled data in Python. Pandas, the Python data analysis library, lets you perform common data analysis tasks such as subsetting and aggregating data, creating new variables, recoding data, computing summary statistics, and visualizing your data.Things you’ll learn in this workshop:

  • Importing data into a pandas DataFrame
  • Subsetting and manipulating data sets
  • Grouping and aggregating data
  • Computing summary measures
  • Basic plotting
  • Merging DataFrames
  • Indices and hierarchical indices

Prerequisites: Familiarity with Python and Jupyter Notebooks

Time and Location: 9am-12pm, Chambers Hall, Lower Level

Practice Session: 1:30pm, Main Library, B183

This workshop is also offered on the Chicago campus with the same content.

Participants may also be interested in the Python: Data Visualization workshop to go beyond the few plotting basics included in this workshop.

July 11

REGISTER

Intro to the Command Line
Using the command line (aka Unix shell, or terminal) is fundamental to using both your computer and more powerful cloud and cluster computational resources effectively. Whether installing and managing Python packages, running Python or R scripts, interacting with databases, using version control systems like git, running machine learning models in the cloud, or using Quest (Northwestern’s high performance supercomputer), being comfortable with the command line will open up new computational possibilities. Data scientists and researchers alike will work more efficiently with command line skills in their toolkit.Things you’ll learn in this workshop:

  • Navigate a file system
  • List information about files and directories
  • Create, copy, and move files
  • Edit files and view their contents
  • Combine commands with pipes
  • Find files and things in files

Prerequisites: None

Time and Location: 9am-12pm, Chambers Hall, Lower Level

Practice Session: 1:30pm, Mudd Library, Small Classroom, Room 2124

July 16, 18

REGISTER

R: Intro to the Tidyverse
The tidyverse is a set of R packages designed to make data manipulation and analysis tasks easier and more consistent. Whether you’re brand new to R or have been using R for years, the tidyverse packages can help you improve your workflow. This workshop will build off of the R for Data Science book by Hadley Wickham and Garrett Grolemund. This is a two session workshop; participants are expected to attend both days. Workshop participants may also be interested in R: Visualization with ggplot2 to learn the ggplot2 package in more depth than what will be covered in this workshop.Things you’ll learn in this workshop:

  • Visualize data with ggplot2
  • Import data with readr
  • Data manipulation with dplyr
  • Reshaping data with tidyr
  • Working with dates and text with lubridate and stringr
  • Repeat statistical tests and bootstrap with infer

Prerequisites: This workshop is suitable for beginners, but it’s recommended that you have experience with R at least at the level of the Introduction to R workshop.

Time and Location: 9am-12pm, Chambers Hall, Lower Level

Practice Session: 1:30pm, Mudd Library, Small Classroom, Room 2124

Materials are available here.

July 20

REGISTER

Python: NumPy and SciPy
Numpy and SciPy are the core libraries for scientific computing with Python. If you want to work with matrices in Python, these are the packages for you. NumPy and SciPy are also the foundation of higher level data science and machine learning packages. If you plan to use packages like Tensorflow or Scikit-learn for machine learning, understanding these packages will help you shape and work with data for both input and output.Things you’ll learn in this workshop:

  • Create and manipulate multidimensional Numpy arrays
  • Import data
  • Do linear algebra in Python
  • Use SciPy’s basic statistical functions
  • Use array operations to speed up your calculations

Prerequisites: Familiarity with Python and Jupyter Notebooks

Time and Location: 9am-12pm, Chambers Hall, Lower Level

Practice Session: CANCELLED

This workshop is also offered on the Chicago campus.

July 24

REGISTER

R: Visualization with ggplot2
ggplot2 is one of the most popular R packages – for good reason. It’s a data visualization framework that logically associates features of your data with elements of the plot. Once you’ve learned the basic ggplot2 syntax for making a plot, you can adapt it to make new visualizations with different plot types easily. ggplot2 makes plotting multiple data series with different lines or marker types straightforward, and it lets you create multiple plots for different groups in your data with one command. Whether you’re creating data visualizations for data exploration or publication, come learn why The New York Times and FiveThirtyEight both use ggplot2.Things you’ll learn in this workshop:

  • Basics of plotting with ggplot2
  • How to change plot types and options
  • Reshaping data before plotting
  • Creating multiple plots with facets
  • Interacting with plots
  • Packages that extend ggplot2

Prerequisites: Experience with R of at least the level of the Introduction to R workshop.

Time and Location: 9am-12pm, ITW Classroom, 1-350 Ford Motor Company Engineering Design Center

Practice Session: 1:30pm, Mudd Library, Large Classroom, Room 2210

This workshop is also offered on the Chicago campus with the same content.

July 25

REGISTER

R: Statistical Modeling and Learning
You’ve learned the basics of R, but now you need to know how to actually do your analysis. If that might include running some type of regression model, this is the workshop for you. Whether you want to use regression for statistical inference or prediction, this workshop will cover the basics to help you get started.Things you’ll learn in this workshop:

  • Use R’s forumla syntax to specify models
  • Run common regression models in R
  • Read model fit summaries and output
  • Make predictions with regression models in R
  • Basic model diagnostic tools

Prerequisites:

Time and Location: 9am-12pm, Mudd Library, Large Classroom, Room 2210

Practice Session: 1:30pm, Mudd Library, Large Classroom, Room 2210

A similar workshop is also offered on the Chicago campus, but with content tailored to life sciences researchers.

July 27

REGISTER

Python: APIs and Web Requests
If you need to collect data from a website or public platform, you’ll need to use APIs (application programming interfaces) or web requests. In this workshop, you’ll learn the basics of both paradigms, and we’ll cover some basic scraping etiquette.Things you’ll learn in this workshop:

  • When to use APIs or web requests
  • How to query for and parse API requests
  • Basics of the Twitter API
  • How to scrap a site with HTML requests
  • How to parse HTML with the beautifulsoup package
  • Requesting etiquette

Prerequisites: Familiarity with Python and Jupyter Notebooks

Time and Location: 9am-12pm, Chambers Hall, Lower Level

Practice Session: 1:30pm, location tbd

July 30

REGISTER

Intro to Databases and SQL
NOTE: This is an all-day workshop. Lunch will be provided.Have you heard about databases but aren’t really sure what they are? Is your data growing beyond what you can manage in Excel? Do you need to learn SQL to access databases you want to use? Have you heard that data scientists use SQL and want to learn more? This workshop will cover the basics of database design and show you how to use SQL to work with relational databases.

Things you’ll learn in this workshop:

  • Principles of database design
  • SQL
  • Selecting, filtering, and aggregating data
  • Joining database tables together
  • Creating a database and defining tables
  • Inserting, updating, and deleting data
  • Importing and exporting data
  • Querying a database from R or Python

Prerequisites: None

Time and Location: 9am-4pm, Chambers Hall, Lower Level

Practice Session: None

August 3

REGISTER

Python: Scikit-learn
Scikit-learn is the core machine learning library for Python. It allows you to run a wide range of classification, clustering, regression, and prediction algorithms all using the same framework. It includes tools for splitting your data into test and training sets, parameter estimation using grid search and cross validation, evaluating your models, and making predictions. It works with NumPy and SciPy. If you want to use Python for predictive modeling, Scikit-learn is the place to start.Things you’ll learn in this workshop:

  • Scikit-learn workflow
  • Creating test and training data sets
  • Fitting both supervised and unsupervised models
  • Maximizing model performance with parameter grid search and cross validation
  • Evaluating model fit
  • Predicting outcomes for new data

Prerequisites:

  • Basic Python and how to use Jupyter Notebooks.
  • Familiarity with the process of building predictive models for classification, clustering, regression, or other applications (separate from Python); this workshop covers how to use Python for machine learning, not the theories behind machine learning

Time and Location: 9am-12pm, Chambers Hall, Lower Level

Practice Session: 1:30pm, location tbd

August 8

REGISTER

GIS: Introduction
A geographic information system (GIS) is a framework for gathering, managing, and analyzing data. It is based in the science of geography and integrates many types of data. It analyzes spatial location and organizes layers of information into visualizations using maps. ​With this unique capability, GIS can help you gain insights into patterns and relationships in your data that you may miss using other statistical programs. If you’ve never used GIS before, or you are looking for a review of the basics, this is a great place to start.Things you’ll learn in this workshop:

  • Create a map using ArcGIS Pro
  • Create a map using QGIS
  • Create a web map using ArcGIS online
  • Import and manipulate data in each program
  • Translate a table of data into a map
  • Perform basic geospatial operations
    • spatial joins
    • buffer analysis
    • table joins
    • geocoding
    • and more…

Prerequisites: None

Time and Location: 9am-12pm, Mudd Library, Large Classroom, Room 2210

Practice Session: The library GIS team will be available in Mudd Library in the afternoon if you have additional questions or want to work on your own projects.

Note: Computers with the necessary software will be provided for this workshop. You do not need to install anything ahead of time.

August 9

REGISTER

Data Strategies
You’ve taken some of the other workshops this summer and learned new skills. But how exactly are you going to apply those skills to your data and research? This workshop will use examples of real data problems members of the community face to demonstrate how to put your Python, R, or other skills to use on real problems. Have an idea or an example of a challenge you’re facing in your own work? Email Thomas Stoeger with your questions, ideas, or (small or large) data strategy problems you have encountered.This workshop is language agnostic and will be beneficial to folks using R or Python (or other languages).

Prerequisites: Basic data analysis or programming

Time and Location: 9am-12pm, Chambers Hall, Lower Level

Practice Session: None

August 16

REGISTER

Python: Preparing Text as Data
You’ve collected or received your text data and need to clean them for analysis. In this workshop we’ll go over the types of cleaning you might need to do given your research question, and how to do it.Things you’ll learn in this workshop:

  • Tokenization
  • (Foreign) language detection
  • Stemming and lemmatization
  • Stoplisting (removing some words)
  • Classifying words by semantic type (e.g. emotional, rational)

Prerequisites: Familiarity with Python and Jupyter Notebooks

Time and Location: 9am-12pm, Chambers Hall, Lower Level

Practice Session: 1:30pm, location tbd

Chicago

Date Description
July 12

REGISTER

Python: Pandas
Do you work with data of different types (numerical, categorical, text, dates)? Does your data have labelled variables or observations? Pandas DataFrames are the way to work with mixed and labeled data in Python. Pandas, the Python data analysis library, let’s you perform common data analysis tasks such as subsetting and aggregating data, creating new variables, recoding data, computing summary statistics, and visualizing your data.Things you’ll learn in this workshop:

  • Importing data into a pandas DataFrame
  • Subsetting and manipulating data sets
  • Grouping and aggregating data
  • Computing summary measures
  • Basic plotting
  • Merging DataFrames
  • Indices and hierarchical indices

Prerequisites: Familiarity with Python and Jupyter Notebooks

Time and Location: 9am-12pm, Galter Health Sciences Library & Learning Center, Large Classroom. Note: food and uncovered drinks are not permitted in Galter Library, so no refreshments will be provided for this workshop.

Practice Session: 1:30pm, Galter Health Sciences Library & Learning Center, meet by the Large Classroom

Participants may also be interested in the Python: Data Visualization workshop on the Evanston campus to go beyond the few plotting basics included in this workshop.

July 17

REGISTER

R: Refresher
Has it been a while since you used R? Maybe you used it for a project or course, but never really learned the basics? Are you familiar with data analysis and scripting and looking for a fast-paced introduction? This workshop will help you remember what you may have once learned and fill in the gaps to get you working with R again.Things you’ll (re)learn in this workshop:

  • How to use RStudio
  • Import and manipulate data
  • Basic plotting
  • How to write an R script
  • Aggregating and summarizing data
  • Import and use packages
  • Read documentation and get more help

Prerequisites: You’ve used R at least once. Those completely new to R should consider completing DataCamp’s free Introduction to R course online before this workshop, or take the Introduction to R workshop offered on the Evanston campus instead.

Time and Location: 9am-12pm, Galter Health Sciences Library & Learning Center, Large Classroom. Note: food and uncovered drinks are not permitted in Galter Library, so no refreshments will be provided for this workshop.

Practice Session: None

July 17

REGISTER

R: Visualization with ggplot2
NOTE: This is an afternoon workshop.ggplot2 is one of the most popular R packages $endash; for good reason. It’s a data visualization framework that logically associates features of your data with elements of the plot. Once you’ve learned the basic ggplot2 syntax for making a plot, you can adapt it to make new visualizations with different plot types easily. ggplot2 makes plotting multiple data series with different lines or marker types straightforward, and it lets you create multiple plots for different groups in your data with one command. Whether you’re creating data visualizations for data exploration or publication, come learn why The New York Times and FiveThirtyEight both use ggplot2.Things you’ll learn in this workshop:

  • Basics of plotting with ggplot2
  • How to change plot types and options
  • Reshaping data before plotting
  • Creating multiple plots with facets
  • Interacting with plots
  • Packages that extend ggplot2

Prerequisites: Experience with R of at least the level of the R: Refresher workshop or Introduction to R workshop.

Time and Location: 1pm-4pm, Galter Health Sciences Library & Learning Center, Large Classroom. Note: food and uncovered drinks are not permitted in Galter Library, so no refreshments will be provided for this workshop.

Practice Session: None

The content of this session is the same as that of the ggplot2 workshop on the Evanston campus.

July 19

REGISTER

Python: NumPy and SciPy
Numpy and SciPy are the core libraries for scientific computing with Python. If you want to work with matrices in Python, these are the packages for you. NumPy and SciPy are also the foundation of higher level data science and machine learning packages. If you plan to use packages like Tensorflow or Scikit-learn for machine learning, understanding these packages will help you shape and work with data for both input and output.Things you’ll learn in this workshop:

  • Create and manipulate multidimensional Numpy arrays
  • Import data
  • Do linear algebra in Python
  • Use SciPy’s basic statistical functions
  • Use array operations to speed up your calculations

Prerequisites: Familiarity with Python and Jupyter Notebooks

Time and Location: 9am-12pm, Galter Health Sciences Library & Learning Center, Large Classroom. Note: food and uncovered drinks are not permitted in Galter Library, so no refreshments will be provided for this workshop.

Practice Session: CANCELLED

The topics covered in this session are the same as those covered in the NumPy and SciPy workshop on the Evanston campus.

July 26

REGISTER

R: Statistical and Machine Learning Models
Ready to use R to analyze your data? If you’ll be using regression or classification models, this workshop will help you get started.Things you’ll learn in this workshop:

  • Run and interpret the results of regression models.
  • How to use the caret package for machine learning and predictive analytics

Prerequisites:

  • Experience with R of at least the level of the R: Refresher workshop or Introduction to R workshop.
  • Familiarity with statistical and/or classification models (separate from R): the theory behind these models is outside the scope of this workshop.

Time and Location: 9am-12pm, Wieboldt Hall, Room 506

Practice Session: 1:30pm, Galter Health Sciences Library & Learning Center, meet by the Large Classroom

August 7

REGISTER

R: Gene Expression Analysis
In this workshop, you’ll learn how to analyze RNA-seq gene expression data to identify candidate genes and pathways using R.Things you’ll learn in this workshop:

  • Perform quality control on raw RNA-seq count data
  • Obtain a list of differently expressed genes
  • Visualize expression patterns
  • Perform functional analyses on candidate genes

Prerequisites:

Time and Location: 9am-12pm, Robert H. Lurie Medical Research Center, Baldwin Auditorium

Practice Session: 1:30pm, Galter Health Sciences Library & Learning Center, meet by the Large Classroom