Python Virtual Environments and Package Management
Introduction
The next useful topic that you will need to know is how to manage Python virtual environments and manage package dependencies.
We also include an exercise at the end of this document. Read through and complete the exercise.
We suggest that you follow along in a command shell on your local machine.
We’re going to build upon our Command Shells skills to install Python and manage virtual environments.
We also assume that you have a basic understanding of Python. If you need a tutorial, check out the Python Tutorial.
The great thing about Python is that it is a very popular language for scientific computing and so there are a lot of great libraries and tools that you can use.
This can also be a challenge because every python project will have different package dependencies, and even dependencies on specific versions of packages.
The solution to this problem is to use virtual environments.
There are basically two categories of environments and package managers:
conda
venv
andpip
Conda
First a little clarification is in order since you’ll hear the terms “conda”, “miniconda”, and “anaconda” thrown around.
Conda is an open source package, dependency, and environment management system that can manage packages for multiple languages but is primarily used for Python.
Anaconda is a distribution of conda that includes a lot of packages bundled together. It is maintained by Anaconda, Inc. We don’t recommend using it because it is a large download and includes a lot of packages that you may not need.
Miniconda is a minimal installation of conda, maintained by Anaconda, Inc. With miniconda you can install only the packages you need.
Anaconda maintians a repository of packages that you can install with the conda install
command. It sometimes happens that a package is not available in the conda repository, but is available in the PyPI repository which we discuss below.
To recap, conda is a tool for both package and environment management and we recommend using miniconda to tailor the installation to your needs.
venv and pip
The venv
module is a standard library that is included with Python that supports creating python virtual environments.
pip
is a package manager that is usually included with a Python installation. It is the recommended way by the Python Packaging Authority to install Python packages. It installs packages from the Python Package Index (PyPI) which tends to be a more comprehensive repository of Python packages than the conda repository.
Python Versioning and Upgrading
pip
vs python3 -m pip
A word of caution here when you have multiple versions of Python installed. The commands python
, python3
and pip
may be associated with different versions of python.
A good habit is to check locations and versions of these commands with:
which python
python --version
which python3
python3 --version
which pip
pip --version
On MacOS, I get
% which python
python not found
% which python3
/Library/Frameworks/Python.framework/Versions/3.12/bin/python3
% python3 --version
Python 3.12.4
% which pip
/Library/Frameworks/Python.framework/Versions/3.12/bin/pip
% pip --version
pip 24.2 from /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip (python 3.12)
% which pip3
/Library/Frameworks/Python.framework/Versions/3.12/bin/pip3
% pip3 --version
pip 24.2 from /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip (python 3.12)
It used to be on previous versions of MacOS it would come with Python 2.x as default as python
, and pip
would be the Python 2.x package manager. Now, you see that in the system installation, there is no python
, and pip
is actually an alias to pip3
.
From the location and version shown above, we can see that pip
, pip3
, and python3
are all associated with the same version, in this case Python 3.12.4.
Note that in both conda and venv, both
pip
andpython
are aliases to thepip3
andpython3
.
pip
is actually just a command line wrapper around the pip
module, so a good precaution is to use
python3 -m pipto install packages. The command
python3 -m pip` is always associated with the version of Python that you are currently using.
Installing and Upgrading Python
It’s very likely there is a newer version of Python available than is installed by default or was previously installed. The way you install Python depends on your operating system and command shell.
There are multiple ways to install.
Using python.org
Probably the easiest way is to download and install the MacOS install package from python.org downloads. This method allows you to install the latest version directly from the Python Software Foundation.
The installation from python.org will likely leave your old installation of Python and install another version in a different location.
Using Homebrew
Alternatively, you can use Homebrew to install and manage Python versions:
Install Homebrew if you haven’t already:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install Python:
brew install python
To update Python using Homebrew:
brew update # Update Homebrew itself brew upgrade python # Upgrade Python to the latest version
Verify the installed version:
python3 --version
Homebrew might not always have the absolute latest version available on python.org, but it provides an easy way to manage and update Python along with other packages.
On WSL Ubuntu, you can install Python from the package manager.
sudo apt update
sudo apt install python3
To see all available versions, you can use the apt list
command.
apt list python3 -a
If there is a newer version available, you can upgrade Python by following these steps:
Update your package list:
sudo apt update
Check the latest version available through apt:
apt list python3 -a
If a newer version is available, upgrade Python:
sudo apt install --only-upgrade python3
Verify the new version:
python3 --version
If you need a specific newer version not available in the default repositories, you can use a PPA:
sudo add-apt-repository ppa:deadsnakes/ppa sudo apt update sudo apt install python3.x # Replace x with the desired version number
Note: Using a PPA allows you to install versions of Python that may not be available in the standard Ubuntu repositories.
There are cases where you might end up with multiple versions of python installed simultaneously, which is fine, but just know that on both MacOS command shells and WSL Ubuntu, the first python executable found in the search path will be the one executed and the one shown from which python3
.
Conda Installation
As mentioned above, we generally prefer the miniconda distribution of conda
. You can find installation instructions at https://docs.anaconda.com/miniconda/.
It describes both the GUI and the command line installation methods.
Remember that for WSL Ubuntu, you’ll need use the command line installation method.
Conda Environment
Next step is to create a conda environment. See here for complete instructions.
conda create -n my_test_env
conda activate my_test_env
To see what packages are installed in the environment, you can use the conda list
command.
conda list
To deactivate the environment, you can use the conda deactivate
command.
conda deactivate
To remove the environment, you can use:
conda remove -n my_test_env --all
Now let’s create a new environment with python 3.12.4 and see what packages are installed.
conda create -n my_test_env python=3.12.4
Now you can activate the environment and see what packages are installed.
conda activate my_test_env
conda list
Installing Packages
To install a package such as jupyter
, you can use the conda install
command.
conda install jupyter
You can also install a specific version of a package or major version of a package.
conda install jupyter=1.0.0
# install the latest minor version of the major version 1
conda install jupyter=1
To list the available versions of packages to install from conda, you can use the conda search
command. Here’s how you can do it:
conda search package_name
For example, if you want to see all available versions of the jupyter
package, you would run:
conda search jupyter
This will show you a list of all available versions of the package, along with information about which channel they’re from and which platforms they support.
If you want to see versions for a specific channel, you can use the --channel
or -c
option:
conda search --channel conda-forge jupyter
To see more details about a package, including its dependencies, you can use the --info
flag:
conda search --info jupyter
If you want to see all packages that match a certain pattern, you can use wildcards:
conda search 'python=3.8*'
This would show all versions of Python 3.8.x available.
Remember, the available versions may depend on your current conda configuration, including which channels you have enabled. You might want to update your conda first to ensure you’re seeing the most recent information:
# make sure you are not in a conda environment
conda deactivate
conda update conda
These commands will help you find the specific versions of packages available for installation through conda.
Updating Packages
To check for updates and then update packages in a specific conda environment, you can follow these steps:
- First, activate the environment you want to check:
conda activate my_test_env
- To check for updates without actually installing them:
conda update --all --dry-run
This command will show you what packages would be updated without actually performing the update.
- If you’re satisfied with the proposed updates, you can perform the actual update:
conda update --all
This will update all packages in the current environment to their latest compatible versions.
You can also update specific packages by naming them:
conda update package1 package2
Remember that conda will only update to versions that are compatible with other packages in your environment. Sometimes, major version updates might require manually specifying the new version or recreating the environment.
By regularly checking for and applying updates, you can ensure your environment has the latest features and security patches.
conda environment.yml
It’s often a good idea to make it easy for others to recreate your environment. You can do this by creating a environment.yml
file.
conda env export > environment.yml
Then to create an environment from the environment.yml
file, you can use the conda env create
command.
conda env create -f environment.yml
Note that this will create a new environment with the same packages as the current environment and also the same name of the environment.
You can see which environments are available with the conda env list
command.
Here’s a handy conda cheat sheet
venv and pip
Creating a Virtual Environment
To create a virtual environment using venv, you can use the following command:
python3 -m venv my_test_env
This creates a new virtual environment named my_test_env
in the current directory.
Activating the Environment
To activate the environment:
source my_test_env/bin/activate
my_test_env\Scripts\activate
When activated, your command prompt should change to indicate the active environment.
Deactivating the Environment
To deactivate the environment:
deactivate
Installing Packages
To install a package such as jupyter
, you can use the pip install
command:
python -m pip install jupyter
You can also install a specific version of a package:
python -m pip install jupyter==1.0.0
# install the latest minor version of the major version 1
python -m pip install jupyter~=1.0.0
Listing Installed Packages
To see what packages are installed in the environment:
pip list
Checking for Package Updates
To check for updates to installed packages:
pip list --outdated
Updating Packages
To update a specific package:
python -m pip install --upgrade package_name
To update all packages:
pip list --outdated | cut -d ' ' -f1 | xargs -n1 pip install -U
Creating Requirements File
To create a requirements.txt
file listing all installed packages:
pip freeze > requirements.txt
Installing from Requirements File
To install packages from a requirements.txt
file:
python -m pip install -r requirements.txt
Removing the Environment
To remove the virtual environment, simply delete the environment folder:
rm -rf my_test_env
rmdir /s /q my_test_env
Remember to deactivate the environment before removing it.
Always use python -m pip
instead of just pip
to ensure you’re using the correct version of pip associated with your current Python environment.
A word on package caching
Note that pip
and conda
will cache packages, often in your home directory. This could be a problem because, for exeample, many versions of the same package can be cached and use up disk space. This can be especially problematic on SCC where you’re home directory is quite limited in size. We’ll talk about that when we cover SCC.
Exercise
Exercise Introduction
Let’s put what we’ve learned into practice. For this exercise, you’ll need to create a new virtual environment and install pytorch and related packages into the environment. You’ll save a snapshot of your environmenat as well. Then you’ll need to copy the python script from below into a file in your working directory. Run the script to train and evaluate the model. The script is a simple CNN for classifying CIFAR-10 images. It also writes out the trained model to a file. You’ll then need to upload your trained model file to Gradescope. You’ll also need to upload your environment.yml
or requirements.txt
file.
Exercise Instructions
- Create a directory for your exercise and cd into it.
- Create a new virtual environment with either
venv
orconda
and then activate it. - Install pytorch and related packages into the environment as described below.
- Copy the python script from below into a file in your working directory. The file should have extension
.py
such ascifar10.py
. - Run the script to train and evaluate the model. You could for example run it with
python cifar10.py
. - Extra credit: Improve the model and/or training regime and re-run the script.
- Save your python environment into a
environment.yml
for conda or arequirements.txt
forpip
. - Upload the .pt model file and your environment file to Gradescope. Enter your name for the leaderboard.
Install PyTorch
From the pytorch website, the install instructions are:
conda install pytorch::pytorch torchvision -c pytorch
pip install torch torchvision
conda install pytorch torchvision cpuonly -c pytorch
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu
Python Script
import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
# Define transformations for the dataset
= transforms.Compose(
transform
[transforms.ToTensor(),0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
transforms.Normalize((
# Download and load the CIFAR-10 dataset
= torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainset = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
trainloader
= torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testset = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
testloader
# Classes
= ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
classes
# Define the CNN model
class SmallCNN(nn.Module):
def __init__(self):
super(SmallCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(32 * 8 * 8, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
= self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 32 * 8 * 8)
x = F.relu(self.fc1(x))
x = self.fc2(x)
x return x
# Initialize the model, loss function, and optimizer
= SmallCNN()
model = nn.CrossEntropyLoss()
criterion = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
optimizer
# Training the model
def train_model(model, trainloader, criterion, optimizer, epochs=5):
for epoch in range(epochs): # Loop over the dataset multiple times
= 0.0
running_loss for i, data in enumerate(trainloader, 0):
# Get the inputs; data is a list of [inputs, labels]
= data
inputs, labels
# Zero the parameter gradients
optimizer.zero_grad()
# Forward pass
= model(inputs)
outputs = criterion(outputs, labels)
loss
# Backward pass and optimize
loss.backward()
optimizer.step()
# Print statistics
+= loss.item()
running_loss if i % 100 == 99: # Print every 100 mini-batches
print(f'[Epoch {epoch + 1}, Batch {i + 1}] loss: {running_loss / 100:.3f}')
= 0.0
running_loss
print('Finished Training')
# Using the TorchScript method for model saving
# Important! Do not change the following 2 lines of code except for the model name
= torch.jit.script(model)
scripted_model 'cifar10-model.pt')
torch.jit.save(scripted_model,
print('Model saved as cifar10-model.pt')
# Call the training function
train_model(model, trainloader, criterion, optimizer)
# Evaluation function to test the accuracy
def evaluate_model(model, testloader):
= 0
correct = 0
total with torch.no_grad():
for data in testloader:
= data
images, labels = model(images)
outputs = torch.max(outputs.data, 1)
_, predicted += labels.size(0)
total += (predicted == labels).sum().item()
correct
= 100 * correct / total
accuracy print(f'Accuracy of the network on the 10,000 test images: {accuracy:.2f}%')
return accuracy
# Call the evaluation function
evaluate_model(model, testloader)
Recap
Recap
In this module, we’ve covered:
- Creating and managing python environments with
conda
andvenv
. - Installing packages with
conda
andpip
. - Creating and managing
environment.yml
andrequirements.txt
files. - Training and evaluating a simple CNN on the CIFAR-10 dataset.