Professional Development Opportunities

Tools from the Couch poster showing name and individual working on a computer.

Tools from the Couch is an opportunity to share the diverse expertise that exists in the department and broader community on different computational (or otherwise!) tools that we use for our research. This is an extension of but includes the Data Science Groups from past terms.

Please register your interest in the series here.

Please email Mark Richardson for more information.

 

Upcoming Sessions

 

Wednesday, Aug 18th, 2021, 2:30 – 4:00 pm (EDT): Proposing Time at Facilities (postponed from June 9th)
Lead: Mark Richardson
Zoom details sent after registration. Please register here
Details: Big science today often relies on access to big facilities, which could include a large telescope, a large high performance computing cluster, a scanning electron microscope, etc. Ultimately, to gain time on these facilities requires writing scientific proposals that argue for both the scientific value of the questions you are trying to answer, and explains the amount of resources you are requesting. In this Tools from the Couch / Professional Development & Learning Session, Dr. Mark Richardson (possibly joined by other scientists in various fields) will give an overview of proposing for time on these large facilities, and discuss the specifics of a few examples.

Others upcoming (date not yet confirmed):

  • Group Discussion of where we learned to code.
  • A hands-on intro to Git
  • High Performance Computing Resources at Queen’s
  • Parallelization: MPI
  • Parallelization: Using your GPU to do more than render.
 

Previous Sessions

 

Wednesday, July 7th, 2021, 2:30 – 4:00 pm (EDT): Parallelization and Message Passing Interface (MPI: Distributed Memory)
Lead: Mark Richardson
Details: Modern simulations and data analysis often relies on parallelization, where multiple processors either work together on small sub-task of your code, or divide and conquer larger work. In this session I will give a brief recap of parallelization, then dive into detail with distributed memory parallelization with MPI (Message Passing Interface). I will showcase converting some serial codes into parallel and highlight speed-up gains. This session will be a prerequisite for the CAC session on July 21st.

Wednesday, July 21st, 2021, 2:30 – 4:00 pm (EDT): Python and Parallelization
Lead: Fernando Hernandez Leiva (Centre for Advance Computing, Queen’s)
Details: Modern simulations and data analysis often relies on parallelization, where multiple processors either work together on small sub-task of your code, or divide and conquer larger work. In this session CAC’s Junior Analytics Developer Fernando Hernandez Leiva will build off of the MPI discussion on July 7th to discussion using MPI in Python.

Tuesday, Nov 3rd, 2020: Multiprocess and Python
Lead: Mark Richardson

 

Thursday, Aug 20th, 2020, at 1pm: Introduction to Parallelization: MPI, OpenMP, and Python
Lead: Mark Richardson
Description: Modern simulations and data analysis often relies on parallelization, where multiple processors either work together on small sub-task of your code, or divide and conquer larger work. In this session I will give an overview of parallelization, showcase converting some serial codes into parallel, highlight speed-up gains, and also give a brief intro the Queen’s Centre for Advanced Computing. During this session we cover OpenMP Multithreading parallelisation.

 

Thursday, July 16th, 2020: Bayesian Analysis: Using MCMC in practice.
Lead: Aaron Vincent
Overview: We will continue our discussion of Bayesian inference methods with a few hands-on exercises. I’ll work through a basic emcee tutorial, demonstrate some of the functionality of MultiNest, and finish with some free time to play around with MCMC samplers.

 

Thursday, July 9th, 2020: Bayesian vs Frequentist: How they differ.
Lead: Connor Stone
Overview: In this session we will explore a specific problem to get a feel for the difference between Bayesian and frequentist statistics. Likely we have all encountered the On/Off problem in one form or another, either when measuring the brightness of an object with a telescope, counting events in a dark matter detector, counting photons, or measuring the radioactivity of a source. Here we will explore the problem in detail and arrive at a counterintuitive result where the Bayesian and frequentist predictions disagree even as the amount of data gets very large! There is of course a good explanation for this, tune in to find out!

 

Thursday, June 25th, 2020: 
Discussion Topic: MCMC and all that: how to do parameter inference and model comparison when things get messy
Lead: Aaron Vincent
Overview: Hypothesis testing is one of the most important parts of scientific research. This is typically expressed as model comparison (is supersymmetry a better model than hyperdupersymmetry?) or parameter inference (what is the most likely value of the sneutralinissimo mass and coupling in my hyperdupersymmetry model?) One of the most challenging aspects in either context is that of sampling. If the parameter space and/or amount of data is large, incorrect sampling can yield spectacularly wrong results, while inefficient sampling can easily keep your CPU busy well past the point when your funding runs out. To address these issues, I’ll introduce the Bayesian framework behind efficient sampling techniques, and describe two common methods: MCMC and nested sampling (Frequentists: don’t panic, you can use these tools too). Time allowing, we’ll end with some kind of tutorial/hands on exercise using emcee and perhaps even Multinest if we get the time.

Prerequisite: An installation of python and anaconda.

Click here for video and slides from the session.

 

Thursday, June 18th, 2020: Version Control: Git, Gitlab, Github, and Bitbucket
Lead: Self-directed
Overview: Version control is a system of monitoring how a collection of files changes over time. A version control system can preserve the state of the files through time, with easy syntax highlighting of changes and capturing commentary for why changes occurred. But most importantly, version control can allow for multiple users, where the files are stored in an online repository, a collaboration team can access the remote repository, make local versions, edit the local versions, and the merge it back into the online repository. Version control will track who caused what changes. While multiple version control software exists, including Subversion, Mercurial, and Git, Git is quickly become the science-standard. These resources will take you through the philosophy of version control and Git in particular, and introduce you to the Github cloud storage resource. Other online resources exist, including Bitbucket, and Gitlab. Gitlab is unique in that it allows you to host the cloud version on your own computer, among other things.

A future session will be held working through some more advanced Git commands together

 

Thursday, June 11th, 2020

Title: Directly Editing Plots as Vector Image Files 

Lead: Zac Kenny

Did you know that you can export your plots from Python/MatLab as editable SVG vector image files? Did you also know that you can easily use simple vector editing software to modify and edit those images? And, no, it’s not cheating!

I will begin with a very basic introduction to what a vector image file is, how it works and what you can do with it. We will open up some SVG plots and go through the process of organizing layers and preparing the image for editing. We will do some basic editing, resizing, recolouring, repositioning, and relabeling to enhance the clarity and message of our plots.

If you send me an SVG version of a plot you’re working on or would like feedback on, we can work on it during the session. There may be opportunity to send files during the session if time permits.

I will be using Adobe Illustrator (CC) for the demonstration. There are many vector editing applications available, and many are free such as GIMP and Inkscape. However, for many reasons, Adobe Illustrator is much preferred. Any version of Adobe Illustrator will allow you to follow along, and Adobe does provide a free trial of Illustrator CC to try for 7 days.

 

Thursday, June 4th, 2020
Title: Intro to Tools for Machine Learning
Lead: Connor Stone
Overview: I will begin with a basic introduction to optimization using scipy minimize. Hopefully this session will help you take full advantage of this function when fitting models to data (machine learning or otherwise). I will also use this as an opportunity to introduce the concept of Regularization, which is a machine learning concept at the core of the success of many widely used algorithms like Neural Networks. Note, it is very common to come across functions with many local minima, so I will show how by using many minimized “walkers” it is possible to optimize a notoriously challenging function: “The Rastrigin”. I will then dive into sci-kit learn. To follow along at home, please install sklearn. Note, we will discuss neural networks in this session only insofar as optimization is applicable to neural networks. We will devote a full session to neural networks in the future.

 

Thursday, May 28th, 2020
Discussion Topic: Group Discussion of Neural Networks and Machine Learning
Lead: Mark Richardson
Overview: As problems continue to require more data to solve, analysis of this data has become time-intensive. As a result, machine learning, the ability for a computer program to learn features of the dataset to expedite the analysis process, has become essential. From digitizing text, to classifying supernovae, and everything in between machine learning will play a huge part of the future of science. For this discussion session we will start a conversation about machine learning, both supervised and unsupervised, and then discuss neural networks in some detail. The format will be a discussion, where people can speak to their own experience implementing machine learning tools. Next week we will dive into some of the tools that exist and let people try them out at home. For this discussion, I encourage you to reflect on how you’ve used machine learning in your research etc. I also recommend the 3blue1brown series on neural networks: https://www.3blue1brown.com/neural-networks.

Click here for video and slides from the session.

 

Thursday, May 21st, 2020: After a two-week hiatus for new summer student orientation, we will be running a session on Overleaf and LaTeX.

Date: Thursday, May 21st, 1:00 PM EDT
Title: Overleaf and LaTeX
Speaker: Mark Richardson
Overview: This Thursday I will be doing a session on Overleaf and LaTeX. LaTeX is a powerful (if not occasionally annoying) tool for type-setting your scientific writing. This is the most common method that all physicists write their articles for publication. Overleaf is a cloud-based LaTeX server where you can host you latex files and relevant figures and bibliographies. You can also share with collaborators and work on your paper together in real-time. If you are unfamiliar with Overleaf or LaTeX, I recommend you join in to the session.
Registration: You can register here for the sessions as a whole, or email Mark Richardson directly (Mark.Richardson@queensu.ca). I will only send out Zoom details to those that register.
 

Thursday, May 7th, 2020: Instead of our usual session, we encourage people to join the C++ session being covered by the Queen’s New Student Particle Astrophysics Workshop (email Ben Tam for more information).

 

Thursday, Apr 30th, 2020: 1:00 – 2:30pm EDT (90 minutes)

An introduction to Matplotlib: Making publication-quality plots

Background: Matplotlib is a plotting library for python, used by many researchers today for generating publication-purpose figures. In this session, we will cover an introduction to Matplotlib, as well as methods for improving the look and feel of your plots.

Presenter: Simran Nerval is a physics master’s student at Queen’s University and researcher at the Arthur B. McDonald Canadian Astroparticle Physics Research Institute. She studies gravitational wave production during the expansion of the early universe. Before coming here, she did her undergraduate degree at the University of Toronto in physics and astronomy where she was a part of the Dunlap Institute for Astronomy and Astrophysics. While she was there, she was a part of the LiteBIRD collaboration and worked with the Canadian Space Agency to figure out how well the LiteBIRD satellite with be able to determine what occurred during the earliest moments of the universe. Alongside her research, she spends time doing a variety of science outreach for events ranging from classroom visits with Let’s Talk Science to Astronomy on tap.

Materials: A pre-made jupyter notebook is available here.

Prerequisites:

  • Python 3, with numpy, matplotlib
  • Jupyter notebook
  • Ideal: Latex font installed
  • Alternative: Access to online Jupyter client, such as syzergy: https://queensu.syzygy.ca/
 

Thursday, Apr 23rd, 2020: 1:30 – 2:30pm EDT (60 minutes)

An introduction to Jupyter Notebooks and Python 3

Background: Jupyter Notebook is a platform for keeping a research diary as you work through your analysis with embedded python. This session will introduce the Jupyter platform and then cover an introduction to Python3, the dominant language for scripting and analysis today.

Presenter: Mark Richardson is the Education and Outreach Officer for the McDonald Institute. He held postdocs at the American Museum of Natural History in 2017-2018, and Oxford University from 2014-2017, and completed his PhD in modeling Galaxy Formation and Evolution at Arizona State University in 2014.

Materials: A pre-made jupyter notebook is available here.

Prerequisites:

  • Python 3, preferably with numpy, matplotlib
  • Jupyter notebook
  • Alternative: Access to online Jupyter client, such as syzergy: https://queensu.syzygy.ca/
 

Introduction to Unix, Unix tools, and Bash Scripting:

Thursdays, Apr 2nd, 9th, and 16th, 2020: 

Unix is the operating system underpinning most high performance computing systems. I will give an overview on navigating the Unix environment, including changing directories, making directories, reading files, and exploring the three Unix commands: grep, sed, and awk. I will then introduce the ideas of environment variables, and how to write basic scripts. Materials include example date to work with.