Skip to main content

Schedule

Registration is required for all workshops; see links below. You’re welcome to attend one workshop or many, but you must register individually for each workshop. Some workshops consist of multiple sessions over several days. Please join the waitlist if a workshop is full, as there may be cancellations or opportunities for additional sessions.

There is a small registration fee ($5/session = $5-$15 per workshop) for most workshops to promote attendance. If the fee would prevent you from attending, please contact us for options.

A NetID is required to register. Workshops are open to Northwestern graduate students, postdocs, faculty, and staff.

All workshops require that you bring a laptop with a wifi connection, charged battery, and the necessary software installed ahead of the workshop. Workshop instructors and helpers will NOT be able to help with software installation during the workshop. See Software Installation for instructions.

Workshops on the Chicago campus are being planned for September. They will likely include a subset of the R and Python workshops below. Look for an announcement in July.

Evanston Workshop Listing

R

Python

SQL

Other

Evanston Workshop Calendar

June 2019

Mon Tue Wed Thu Fri
24
Fundamental Programming Concepts
25
R: Introduction (1/3)
26
Command Line: Introduction
27
R: Introduction (2/3)


Command Line: Bash Scripting


Command Line: Working with Data Files

28
R: Introduction (3/3)

July 2019

Mon Tue Wed Thu Fri
1
R: Intro for Stata Users (1/2)
2
R: Intro for Stata Users (2/2)
3 4 5
8 9
Fundamental Programming Concepts
10
Python: Introduction (1/3)


Quest Introduction 

11
Python: Introduction (2/3)


Parallelization Strategies

12
Python: Introduction (3/3)


Advanced MATLAB

15
Python Next Steps: Functions and Scripting (1/2)
16 17
Python Next Steps: Functions and Scripting (2/2)
18
Python: Data Manipulation with Pandas
19
22
R: Visualization with ggplot2
23
Git and GitHub: Introduction
24 25
Git and GitHub: Next Steps


R: Predictive Modeling

26
Python: Data Visualization
29
R: Data Manipulation with the Tidyverse (1/2)
30
R: Data Manipulation with the Tidyverse (2/2)
31
R: Statistical Modeling
   

August 2019

Mon Tue Wed Thu Fri
      1 2
5
SQL: Selecting  and Joining Data


SQL: Exploring Data

6 7
Python: Topic Models for Text
8
Data Strategies
9
12
SQL: Updating and Changing  Data


Python: NumPy/SciPy

13
SQL: Designing and Creating Databases


Python: Predictive Modeling with Scikit-learn (1/2)

14
Python: Predictive Modeling with Scikit-learn (2/2)
15 16
19
Python: Tensorflow (1/2)
20
Python: Tensorflow (2/2)
21 22 23

Workshop Descriptions

Date Description
June 24
9a-12p

REGISTER
July 9
9a-12p

REGISTER
Fundamental Programming Concepts

New to programming? Want to learn R or Python? When you’re getting started, you’ll need to learn more than just the language you’re interested in. There are fundamental concepts and terms that are shared across programming languages. They will help you understand how to give instructions to computers. Getting familiar with these ideas will help you get the most out of the Python and R introductory workshops.

Things you’ll learn in this workshop:

  • How filesystem paths work
  • Variables: how they work
  • Data types: integer, double/numeric/float, strings, boolean
  • Comparisons and conditional tests
  • If-then-else statements
  • Loops
  • Lists/arrays/vectors
  • Functions: calling them, arguments and parameters, return values
  • Interactive coding vs. scripts

Prerequisites: None

June 24th Location: Mudd Library, Large Classroom 2210
July 9th Location: Mudd Library, Small Classroom 2124

July 10, 11, 12
9a-12p

REGISTER
Python: Intro

Learn the basics of the Python programming language as a foundation for writing scripts or conducting data analysis. This workshop is appropriate for beginners, but it is assumed that participants will be familiar with concepts covered in the Fundamental Programming Concepts workshop. If you are new to programming, please sign up for the Fundamental Programming Concepts workshop as well. Participants are expected to attend all three days.

Things you’ll learn in this workshop:

  • Create variables and objects
  • Read and write files
  • Use conditional statements and loops
  • Import and use packages
  • Work with a dataset
  • Run your code in Jupyter Notebooks or as a script
  • Read documentation and get more help

Prerequisites: Fundamental Programming Concepts or equivalent knowledge of the concepts covered (separate from Python) before this workshop.

Location: Mudd Library, Large Classroom 2210

July 15, 17
9a-12p

REGISTER
Python: Next Steps: Functions and Scripting

Wondering what if __name__ == “__main__” does? Are you ready to break up your code into functions? Need to write a script you can call from the command line? Want to break your code up into multiple files so you can reuse functions? Are you ready to make your code reusable and easy to read? Then this two-part workshop will help you take the next steps with Python; participants are expected to attend both days.

Things you’ll learn in this workshop:

  • When, why, and how to define functions
  • Writing Python scripts
  • Calling scripts with arguments from the command line
  • Splitting code across multiple files
  • sys module
  • Python style guide

Prerequisites: Python: Intro or equivalent knowledge.

Location: Mudd Library, Large Classroom 2210

July 26
9a-12p

REGISTER
Python: Data Visualization

If you’re analyzing data with Python, then you need to be able to visualize your data as well. You can plot pandas data frames directly, but for certain chart types, formats, and options, you need to use the underlying matplotlib library. Learning matplotlib will help you with creating other specialized data visualizations in Python as well, as most Python data visualization libraries are based on it. We’ll look at Seaborn, a library for statistical data visualization, as one example of a specialized plotting library.

Things you’ll learn in this workshop:

  • Read data into a pandas data frame for plotting
  • Plotting with matplotlib
  • How to create several different types of plots
  • Statistical data visualizations with Seaborn

Prerequisites: Familiarity with Python and Jupyter Notebooks

Location: Chambers Hall, 600 Foster St., Lower level

July 18
1p-4p

REGISTER
Python: Pandas

Do you work with data of different types (numerical, categorical, text, dates)? Does your data have labelled variables or observations? Pandas DataFrames are the way to work with mixed and labeled data in Python. Pandas, the Python data analysis library, lets you perform common data analysis tasks such as subsetting and aggregating data, creating new variables, recoding data, computing summary statistics, and visualizing your data.

Things you’ll learn in this workshop:

  • Importing data into a pandas DataFrame
  • Subsetting and manipulating data sets
  • Grouping and aggregating data
  • Computing summary measures
  • Basic plotting
  • Merging DataFrames
  • Indices and hierarchical indices

Prerequisites: Familiarity with Python and Jupyter Notebooks

Location: Chambers Hall, 600 Foster St., Lower level

August 12
1p-4p

REGISTER
Python: NumPy and SciPy

Numpy and SciPy are the core libraries for scientific computing with Python. If you want to work with matrices in Python, these are the packages for you. NumPy and SciPy are also the foundation of higher level data science and machine learning packages. If you plan to use packages like Tensorflow or Scikit-learn for machine learning, understanding these packages will help you shape and work with data for both input and output.

Things you’ll learn in this workshop:

  • Create and manipulate multidimensional Numpy arrays
  • Import data
  • Do linear algebra in Python
  • Use SciPy’s basic statistical functions
  • Use array operations to speed up your calculations

Prerequisites: Familiarity with Python and Jupyter Notebooks

Location: Chambers Hall, 600 Foster St., Lower level

August 13, 14
1p-4p

REGISTER
Python: Predictive Modeling with Scikit-learn

Do you want to learn from existing data, describe the world and make predictions? To do this you need to learn how to preprocess data, choose and build predictive models, tune model parameters and determine how well the model will perform on unseen data. In this workshop, we will use Python’s Scikit-learn to perform supervised learning on real world datasets. This is a two-day workshop.

Things you’ll learn in this workshop:

  • Preprocess data to enter the predictive modeling pipeline
  • Choose the appropriate model for your objective
  • Engineer features to maximize predictive power
  • Validate models with the appropriate success metrics
  • Troubleshoot common issues
  • Visualize findings

Prerequisites: Familiarity with Python and Jupyter Notebooks; previous coursework in data analysis, probability, and statistics

Location: Chambers Hall, 600 Foster St., Lower level

 

August 19, 20
1p-4p

REGISTER
Python: Deep Learning with Tensorflow

In this workshop you will learn the foundations of, and practice the skills necessary to do, deep learning with Tensorflow. We will teach the basic tools and vocabulary to get started with deep learning, and walk you through how to use it to address some of the most common machine learning problems such as voice/sound recognition, threat and fraud detection, image recognition and motion detection. This is a two-day workshop.

Things you’ll learn in this workshop:

  • What is Tensorflow and how to set it up
  • What types of problems deep learning can help with
  • What tensors are and how to work with them
  • Build, train and apply fully connected deep neural networks
  • Evaluate model performance

Prerequisites: Familiarity with Python and Jupyter Notebooks; previous coursework in data analysis, probability, and statistics

Location: Chambers Hall, 600 Foster St., Lower level

August  7
1p-4p

REGISTER
Python: Topic Models

Have text you want analyze? Would you like to know what topics appear in your documents without having to define the topics yourself? Topic modeling allows you to find the latent topical structure of a collection of texts based on the patterns of words in the documents.

Things you’ll learn in this workshop:

  • How to prepare documents for topic analysis
  • How to apply topic models using approaches such as:
    • Latent Dirichlet analysis
    • Non-negative matrix factorization
    • Stochastic block models
  • How to examine and display the results of a topic analysis:
    • Topic coherence
    • Clustering
    • Classification
    • Visual displays

Prerequisites: Familiarity with Python

Location: Chambers Hall, 600 Foster St., Lower level

June 25,27,28
9a-12p

REGISTER
R: Introduction

R is a programming language and environment designed for statistical computing. If you do statistical analysis, data visualization, or data manipulation, it’s a powerful tool that is used by academics and data scientists alike. It supports reproducible workflows and helps you take analysis beyond using built-in models on relatively clean data sets as you might with other statistical analysis programs. Participants are expected to attend all three days. If you’re a Stata user, you might want Intro to R for Stata Users instead.

Things you’ll learn in this workshop:

  • How to use RStudio and projects
  • R fundamentals and syntax so you can read R code
  • Import and manipulate data
  • Basic plotting
  • Aggregating and summarizing data
  • Import and use packages
  • Read documentation and get more help

Prerequisites: Fundamental Programming Concepts or equivalent knowledge of the concepts covered (separate from R) before this workshop.

Location: Mudd Library, Small Classroom 2124

July 1, 2
1p-4p
REGISTER
R: Introduction for Stata Users

In this workshop Stata users will become familiar with programming in R. We will learn how to use R to do data management and data analysis. Since this workshop is for those familiar with Stata, we will emphasize how to translate Stata code into R. In addition, we will learn how to use R’s powerful data visualization tools.

Things you’ll learn in this workshop:

  • R basics
  • Difference in logic between Stata and R (objects vs. functions)
  • How to translate common functions used in Stata into R
  • Data visualization in R

Prerequisites: Familiarity with Stata; Fundamental Programming Concepts may be useful, but it’s not required

Location: Mudd Library, Large Classroom 2210

July 22
9a-12p

REGISTER
R: Visualization with ggplot2

ggplot2 is one of the most popular R packages – for good reason. It’s a data visualization framework that logically associates features of your data with elements of the plot. Once you’ve learned the basic ggplot2 syntax for making a plot, you can adapt it to make new visualizations with different plot types easily. ggplot2 makes plotting multiple data series with different lines or marker types straightforward, and it lets you create multiple plots for different groups in your data with one command. Whether you’re creating data visualizations for data exploration or publication, come learn why The New York Times and FiveThirtyEight both use ggplot2.

Things you’ll learn in this workshop:

  • Basics of plotting with ggplot2
  • How to change plot types and options
  • Reshaping data before plotting
  • Creating multiple plots with facets
  • Interacting with plots
  • Packages that extend ggplot2

Prerequisites: Experience with R of at least the level of the Introduction to R workshop.

Location: Mudd Library, Large Classroom 2210

July 29, 30
9a-12p

REGISTER
R: Data Manipulation with the Tidyverse

Familiar with R but want to take your data analysis skills to the next level? Tidyverse is a collection of R packages designed for data scientists to easily handle data import, manipulation, exploration, and visualization – all written with the same core syntax. In this workshop you will gain experience working with real-life messy data and will learn to transform raw data into polished data for further analysis and plotting. Basic familiarity with R is required, but all levels are welcome. This is a two-part workshop.

Things you’ll learn in this workshop:

  • How to recognize clean versus messy data
  • How to tidy messy data with tidyr
  • How to string together data manipulation commands with dplyr
  • How to work with text with stringr
  • Importing data with readr

Prerequisites: Experience with R of at least the level of the Introduction to R workshop.

Location: Chambers Hall, 600 Foster St., Lower level

July 31
9a-12p

REGISTER
R: Statistical Modeling

You’ve learned the basics of R, but now you need to run your statistical analyses. How do you use R for hypothesis testing? What does the regression output mean? How can you get predicted values from a model? This workshop will teach you how to compute some common types of models such as linear regression, logistic regression, and ANOVA and provide you with a foundation for using R for models specific to your research or field.

Things you’ll learn in this workshop:

  • The R formula syntax
  • lm and glm functions for regression models
  • Reading model summary output
  • Working with model result objects and summary objects
  • Getting predicted values from a model

Prerequisites: Experience with R of at least the level of the Introduction to R workshop; previous coursework in statistics including regression models

Location: Chambers Hall, 600 Foster St., Lower level

July 25
1p-4p
REGISTER
R: Predictive Modeling

Predictive modeling involves using variables to predict outcomes on new data, focusing on predictive accuracy over explanatory depth. Methods frequently included under the umbrella of “machine learning” are included in this definition of predictive modeling and comprise some of the techniques we will learn in this workshop.

Things you’ll learn in this workshop:

  • The rationale behind the use of various predictive modeling strategies: when to use them and what to consider prior to their application.
  • How to implement predictive models such as logistic regression, classification/regression trees, random forests, and boosted trees in R.
  • Basic philosophy behind predictive modeling as it compares to explanatory modeling more frequently implemented in scientific research.

Prerequisites: Experience with R of at least the level of the Introduction to R workshop; previous coursework in statistics

Location: Chambers Hall, 600 Foster St., Lower level

August 5
9a-12p

REGISTER
SQL: Selecting and Joining Data

New to databases? This is the place to start. This workshop will get you started working with data in an existing database. You’ll learn how to navigate the database, see what it contains, and select data. You’ll learn some key concepts and terms to help you expand your SQL and database knowledge later. This workshop uses PostgreSQL, but the concepts are generally applicable to other SQL databases.

Things you’ll learn in this workshop:

  • How a SQL database is structured
  • How to select data from a table
  • How to join tables together
  • Aliasing columns and tables in queries
  • How to use a database client
  • What primary and foreign keys are
  • Grouping, counting, and aggregating

Prerequisites: none

Location: Mudd Library, Large Classroom 2210

August  5
1p-4p

REGISTER
SQL: Exploring Data

You know how to select data from a table, but how do you do more than pull rows and columns of data? How can you get summary measures from the database? How can you find text that matches what you’re looking for? What if you need to manipulate the data? This workshop is the second in the SQL series focused on working with existing databases. This workshop uses PostgreSQL, but the concepts are generally applicable to other SQL databases.

Things you’ll learn in this workshop:

  • SQL data types and casting data to other types
  • Numerical aggregation functions such as average and standard deviation
  • Functions for truncating and aggregating date/time data
  • Matching strings with LIKE patterns
  • Subqueries and other ways to build more complex queries

Prerequisites: SQL: Selecting and Joining Data or equivalent knowledge

Location: Mudd Library, Large Classroom 2210

August  12
9a-12p

REGISTER
SQL: Updating and Changing Data

If you need to learn to change the data in a database, this workshop is for you. You’ll learn how to make changes to an existing database, such as updating, inserting, and deleting data. This workshop uses PostgreSQL, but the concepts are generally applicable to other SQL databases.

Things you’ll learn in this workshop:

  • Transactions
  • Update existing values in a database
  • Insert new rows into a database table
  • Delete rows from a database table
  • Delete tables
  • Update and delete data in one table using information in another table
  • Create temporary tables using select queries
  • Alter the structure of existing tables

Prerequisites: SQL: Selecting and Joining Data or equivalent knowledge

Location: Mudd Library, Small Classroom 2124

August  13
9a-12p

REGISTER
SQL: Designing and Creating Databases

Ready to make your own database? Trying to decide whether your data should be stored in a database? This workshop will cover basic concepts of database design. You will map out a plan for how data can be stored in a database, create new tables, import data, and define relationships between tables. This is a practical workshop aimed at researchers building databases for their data. This workshop uses PostgreSQL, but the concepts are generally applicable to other SQL databases.

Things you’ll learn in this workshop:

  • Create new tables
  • Constraints
  • Primary and foreign keys
  • Importing data
  • Choosing between data types
  • Principles of database design
  • When to use a database vs. other methods of storing data

Prerequisites: SQL: Updating and Changing Data or equivalent knowledge

Location: Mudd Library, Small Classroom 2124

June 26
1p-4p

REGISTER
Command Line: Introduction

Using the command line (aka Unix shell, or terminal) is fundamental to using both your computer and more powerful cloud and cluster computational resources effectively. Whether installing and managing Python packages, running Python or R scripts, interacting with databases, using version control systems like git, running machine learning models in the cloud, or using Quest (Northwestern’s high-performance supercomputer), being comfortable with the command line will open up new computational possibilities. Data scientists and researchers alike will work more efficiently with command line skills in their toolkit.

Things you’ll learn in this workshop:

  • Navigate a file system
  • List information about files and directories
  • Create, copy, and move files
  • Edit files and view their contents
  • Combine commands with pipes
  • Find files and things in files

Prerequisites: None

Alternatives: Can’t attend this workshop? The workshop is based on Software Carpentry: The Unix Shell which you can work through on your own. There is also a free online course covering the same material: DataCamp: Introduction to Shell for Data Science.

Location: Mudd Library, Small Classroom 2124

June 27
1p-2:30p

REGISTER
Command Line: Bash Scripting

Knowing Bash well opens up the possibility to write helpful programs using the hundreds of utilities (commands) included with a Linux or UNIX operating system. In this workshop you will learn to write short scripts that save you hours when performing tasks like renaming files in bulk or separating them based on their timestamps. This workshop is geared towards participants that have some experience working on Unix-like systems (for example, Quest) and are ready to do more.

Things you’ll learn in this workshop:

  • Create reusable command line tools and scripts
  • Manage your workflows
  • Define and use variables
  • Write loops

This workshop is scheduled immediately before Command Line: Working with Data Files. You may be interested in both workshops.

Prerequisites: Familiarity with the command line at the level of Command Line: Introduction; conceptual understanding of loops and variables outside of Bash

Location: Mudd Library, Small Classroom 2124, 1.5 hour workshop

June 27
2:30p-4p

REGISTER
Command Line: Working with Data Files

Bash can be a powerful tool for researchers and data scientists working with large files or many files. It is also useful for downloading files from websites or remote systems. Taking advantage of utilities to manipulate and explore files and their contents can make you more productive and simplify your work.

Things you’ll learn in this workshop:

  • Downloading data from remote sites with tools such as wget, curl, and scp
  • Searching and editing data files with sed and awk
  • Ways to manage your workflows
  • How to combine multiple commands

This workshop is scheduled immediately after Command Line: Bash Scripting. You may be interested in both workshops.

Prerequisites: Familiarity with the command line at the level of Command Line: Introduction

Location: Mudd Library, Small Classroom 2124, 1.5 hour workshop

July 23
1p-4p
REGISTER
Git and GitHub: Introduction

A version control system is a useful tool that keeps track of changes to files, including code and text documents. These systems are also crucial for collaborating with other people and merging changes while you work on the same files. Git is a popular modern version control system that is widely used, free, extremely fast and very capable. GitHub is a hosting service for git and a platform for collaboration. Come learn the basics of using git and GitHub.

Things you’ll learn in this workshop:

  • What version control is and when to use it
  • Using git from the command line
  • Sharing files via GitHub

Prerequisites: Familiarity with the command line at the level of Command Line: Introduction, specifically an understanding of your computer’s file system, the ability to locate files from the command line, and the ability to edit text files

Location: Mudd Library, Large Classroom 2210

July 25
9a-12p
REGISTER
Git and GitHub: Intermediate

Do you know what git is but wonder how it’s really used? Are you wondering why and how people use branches and pull requests? Do you have a sense that you could be doing more with git and GitHub than pushing and pulling to the master branch, and resetting your repository every time something goes wrong? Then join us to take the next steps with git and GitHub to use them more effectively for collaboration with others.

Things you’ll learn in this workshop:

  • Branches
  • Pull requests
  • GitHub issues
  • Troubleshooting

Prerequisites: Familiarity with git at the level of Git and GitHub: Introduction

Location: Mudd Library, Small Classroom 2124

July 11
1p-3p
REGISTER
Parallelization Strategies

Do you wonder what people mean by parallelization? Are you looking for ways to speed up your code? How can you best take advantage of Quest (Northwestern’s high-performance supercomputer)? This workshop will introduce you to different options for splitting your code up into smaller processes that can run at the same time. The principles can be applied to many different programming languages (Python, R, C, Fortran, etc.) and examples of code in multiple languages will be provided.

Things you’ll learn in this workshop:

  • Identifying blocks of code that can be run in parallel
  • When different parallelization strategies are appropriate
  • How to adapt scripts to run in parallel
  • Troubleshooting common issues

Prerequisites: Familiarity with a programming language

Location: Mudd Library, Small Classroom 2124

August  8
1p-4p

REGISTER
Data Strategies

You’ve taken some of the other workshops this summer and learned new skills. But how exactly are you going to apply those skills to your data and research? How do you structure a research project around data? How do you organize code so that you can work efficiently? This workshop will use examples of real data problems members of the community face to demonstrate how to put your Python, R, or other skills to use on real problems. Have an idea or an example of a challenge you’re facing in your own work? Email Thomas Stoeger with your questions, ideas, or (small or large) data strategy problems you have encountered.

This workshop is language agnostic and will be beneficial to folks using R or Python (or other languages).

Prerequisites: Basic data analysis or programming skills

Location: Chambers Hall, 600 Foster St., Lower level

July 12
1p-4p
REGISTER
Advanced MATLAB

MATLAB is a high-level matrix/array language with an intuitive IDE and an interactive graphics system that makes it an easy language for the beginners. MATLAB codes however, are susceptible to sub-optimal performance if good programming practices are not followed. In this module we will go over some techniques and tools to optimize the code performance in MATLAB.

Things you’ll learn in this workshop:

  • MATLAB data structures
  • Working with MATLAB Profiler
  • Important practices: Preallocation, Vectorization
  • Fast referencing operations
  • Function handles
  • Optimizing memory usage
  • System Objects (MEX codes)
  • Parallel processing
  • GPU acceleration
  • Reading/Writing large files

Prerequisites: Familiarity with MATLAB

Location: Mudd Library, Large Classroom 2210

July 10
1p-2:30p
REGISTER
Quest Introduction

New to Quest or looking to improve your familiarity with it? Come get an overview of the system, learn how to submit jobs, and get familiar with best practices. All researchers are welcome to attend, but you’ll need an active Quest allocation to participate interactively, which will help you get the most out of the workshop. See the About Quest page for more information on getting an allocation.

Things you’ll learn in this workshop:

  • What is Quest? How is it structured?
  • Connect to Quest
  • Transfer files to Quest
  • Find and use software on Quest
  • What is parallel computing?
  • Submit a job to Quest

Prerequisites: Familiarity with the command line at the level of Command Line: Introduction. Optional, but highly recommended: A Quest allocation; see About Quest. Apply for the allocation at least a week before the workshop.

Location: Mudd Library, Large Classroom 2210