Python Virtual Environments and Package Management
Introduction
The next useful topic that you will need to know is how to manage Python virtual environments and manage package dependencies.
We also include an exercise at the end of this document. Read through and complete the exercise.
We suggest that you follow along in a command shell on your local machine.
We’re going to build upon our Command Shells skills to install Python and manage virtual environments.
We also assume that you have a basic understanding of Python. If you need a tutorial, check out the Python Tutorial.
The great thing about Python is that it is a very popular language for scientific computing and so there are a lot of great libraries and tools that you can use.
This can also be a challenge because every python project will have different package dependencies, and even dependencies on specific versions of packages.
The solution to this problem is to use virtual environments.
There are basically two categories of environments and package managers:
condavenvandpip
Conda
First a little clarification is in order since you’ll hear the terms “conda”, “miniconda”, and “anaconda” thrown around.
Conda is an open source package, dependency, and environment management system that can manage packages for multiple languages but is primarily used for Python.
Anaconda is a distribution of conda that includes a lot of packages bundled together. It is maintained by Anaconda, Inc. We don’t recommend using it because it is a large download and includes a lot of packages that you may not need.
Miniconda is a minimal installation of conda, maintained by Anaconda, Inc. With miniconda you can install only the packages you need.
Anaconda maintians a repository of packages that you can install with the conda install command. It sometimes happens that a package is not available in the conda repository, but is available in the PyPI repository which we discuss below.
To recap, conda is a tool for both package and environment management and we recommend using miniconda to tailor the installation to your needs.
venv and pip
The venv module is a standard library that is included with Python that supports creating python virtual environments.
pip is a package manager that is usually included with a Python installation. It is the recommended way by the Python Packaging Authority to install Python packages. It installs packages from the Python Package Index (PyPI) which tends to be a more comprehensive repository of Python packages than the conda repository.
Python Versioning and Upgrading
pip vs python3 -m pip
A word of caution here when you have multiple versions of Python installed. The commands python, python3 and pip may be associated with different versions of python.
A good habit is to check locations and versions of these commands with:
which python
python --version
which python3
python3 --version
which pip
pip --versionOn MacOS, I get
% which python
python not found
% which python3
/Library/Frameworks/Python.framework/Versions/3.12/bin/python3
% python3 --version
Python 3.12.4
% which pip
/Library/Frameworks/Python.framework/Versions/3.12/bin/pip
% pip --version
pip 24.2 from /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip (python 3.12)
% which pip3
/Library/Frameworks/Python.framework/Versions/3.12/bin/pip3
% pip3 --version
pip 24.2 from /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip (python 3.12)It used to be on previous versions of MacOS it would come with Python 2.x as default as python, and pip would be the Python 2.x package manager. Now, you see that in the system installation, there is no python, and pip is actually an alias to pip3.
From the location and version shown above, we can see that pip, pip3, and python3 are all associated with the same version, in this case Python 3.12.4.
Note that in both conda and venv, both
pipandpythonare aliases to thepip3andpython3.
pip is actually just a command line wrapper around the pip module, so a good precaution is to usepython3 -m pipto install packages. The commandpython3 -m pip` is always associated with the version of Python that you are currently using.
Installing and Upgrading Python
It’s very likely there is a newer version of Python available than is installed by default or was previously installed. The way you install Python depends on your operating system and command shell.
There are multiple ways to install.
Using python.org
Probably the easiest way is to download and install the MacOS install package from python.org downloads. This method allows you to install the latest version directly from the Python Software Foundation.
The installation from python.org will likely leave your old installation of Python and install another version in a different location.
Using Homebrew
Alternatively, you can use Homebrew to install and manage Python versions:
Install Homebrew if you haven’t already:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"Install Python:
brew install pythonTo update Python using Homebrew:
brew update # Update Homebrew itself brew upgrade python # Upgrade Python to the latest versionVerify the installed version:
python3 --version
Homebrew might not always have the absolute latest version available on python.org, but it provides an easy way to manage and update Python along with other packages.
On WSL Ubuntu, you can install Python from the package manager.
sudo apt update
sudo apt install python3To see all available versions, you can use the apt list command.
apt list python3 -a
If there is a newer version available, you can upgrade Python by following these steps:
Update your package list:
sudo apt updateCheck the latest version available through apt:
apt list python3 -aIf a newer version is available, upgrade Python:
sudo apt install --only-upgrade python3Verify the new version:
python3 --versionIf you need a specific newer version not available in the default repositories, you can use a PPA:
sudo add-apt-repository ppa:deadsnakes/ppa sudo apt update sudo apt install python3.x # Replace x with the desired version number
Note: Using a PPA allows you to install versions of Python that may not be available in the standard Ubuntu repositories.
There are cases where you might end up with multiple versions of python installed simultaneously, which is fine, but just know that on both MacOS command shells and WSL Ubuntu, the first python executable found in the search path will be the one executed and the one shown from which python3.
Conda Installation
As mentioned above, we generally prefer the miniconda distribution of conda. You can find installation instructions at https://docs.anaconda.com/miniconda/.
It describes both the GUI and the command line installation methods.
Remember that for WSL Ubuntu, you’ll need use the command line installation method.
Conda Environment
Next step is to create a conda environment. See here for complete instructions.
conda create -n my_test_env
conda activate my_test_envTo see what packages are installed in the environment, you can use the conda list command.
conda listTo deactivate the environment, you can use the conda deactivate command.
conda deactivateTo remove the environment, you can use:
conda remove -n my_test_env --allNow let’s create a new environment with python 3.12.4 and see what packages are installed.
conda create -n my_test_env python=3.12.4Now you can activate the environment and see what packages are installed.
conda activate my_test_env
conda listInstalling Packages
To install a package such as jupyter, you can use the conda install command.
conda install jupyterYou can also install a specific version of a package or major version of a package.
conda install jupyter=1.0.0
# install the latest minor version of the major version 1
conda install jupyter=1To list the available versions of packages to install from conda, you can use the conda search command. Here’s how you can do it:
conda search package_nameFor example, if you want to see all available versions of the jupyter package, you would run:
conda search jupyterThis will show you a list of all available versions of the package, along with information about which channel they’re from and which platforms they support.
If you want to see versions for a specific channel, you can use the --channel or -c option:
conda search --channel conda-forge jupyterTo see more details about a package, including its dependencies, you can use the --info flag:
conda search --info jupyterIf you want to see all packages that match a certain pattern, you can use wildcards:
conda search 'python=3.8*'This would show all versions of Python 3.8.x available.
Remember, the available versions may depend on your current conda configuration, including which channels you have enabled. You might want to update your conda first to ensure you’re seeing the most recent information:
# make sure you are not in a conda environment
conda deactivate
conda update condaThese commands will help you find the specific versions of packages available for installation through conda.
Updating Packages
To check for updates and then update packages in a specific conda environment, you can follow these steps:
- First, activate the environment you want to check:
conda activate my_test_env- To check for updates without actually installing them:
conda update --all --dry-runThis command will show you what packages would be updated without actually performing the update.
- If you’re satisfied with the proposed updates, you can perform the actual update:
conda update --allThis will update all packages in the current environment to their latest compatible versions.
You can also update specific packages by naming them:
conda update package1 package2Remember that conda will only update to versions that are compatible with other packages in your environment. Sometimes, major version updates might require manually specifying the new version or recreating the environment.
By regularly checking for and applying updates, you can ensure your environment has the latest features and security patches.
conda environment.yml
It’s often a good idea to make it easy for others to recreate your environment. You can do this by creating a environment.yml file.
conda env export > environment.ymlThen to create an environment from the environment.yml file, you can use the conda env create command.
conda env create -f environment.ymlNote that this will create a new environment with the same packages as the current environment and also the same name of the environment.
You can see which environments are available with the conda env list command.
Here’s a handy conda cheat sheet
venv and pip
Creating a Virtual Environment
To create a virtual environment using venv, you can use the following command:
python3 -m venv my_test_envThis creates a new virtual environment named my_test_env in the current directory.
Activating the Environment
To activate the environment:
source my_test_env/bin/activatemy_test_env\Scripts\activateWhen activated, your command prompt should change to indicate the active environment.
Deactivating the Environment
To deactivate the environment:
deactivateInstalling Packages
To install a package such as jupyter, you can use the pip install command:
python -m pip install jupyterYou can also install a specific version of a package:
python -m pip install jupyter==1.0.0
# install the latest minor version of the major version 1
python -m pip install jupyter~=1.0.0Listing Installed Packages
To see what packages are installed in the environment:
pip listChecking for Package Updates
To check for updates to installed packages:
pip list --outdatedUpdating Packages
To update a specific package:
python -m pip install --upgrade package_nameTo update all packages:
pip list --outdated | cut -d ' ' -f1 | xargs -n1 pip install -UCreating Requirements File
To create a requirements.txt file listing all installed packages:
pip freeze > requirements.txtInstalling from Requirements File
To install packages from a requirements.txt file:
python -m pip install -r requirements.txtRemoving the Environment
To remove the virtual environment, simply delete the environment folder:
rm -rf my_test_envrmdir /s /q my_test_envRemember to deactivate the environment before removing it.
Always use python -m pip instead of just pip to ensure you’re using the correct version of pip associated with your current Python environment.
A word on package caching
Note that pip and conda will cache packages, often in your home directory. This could be a problem because, for exeample, many versions of the same package can be cached and use up disk space. This can be especially problematic on SCC where you’re home directory is quite limited in size. We’ll talk about that when we cover SCC.
Exercise
Exercise Introduction
Let’s put what we’ve learned into practice. For this exercise, you’ll need to create a new virtual environment and install pytorch and related packages into the environment. You’ll save a snapshot of your environmenat as well. Then you’ll need to copy the python script from below into a file in your working directory. Run the script to train and evaluate the model. The script is a simple CNN for classifying CIFAR-10 images. It also writes out the trained model to a file. You’ll then need to upload your trained model file to Gradescope. You’ll also need to upload your environment.yml or requirements.txt file.
Exercise Instructions
- Create a directory for your exercise and cd into it.
- Create a new virtual environment with either
venvorcondaand then activate it. - Install pytorch and related packages into the environment as described below.
- Copy the python script from below into a file in your working directory. The file should have extension
.pysuch ascifar10.py. - Run the script to train and evaluate the model. You could for example run it with
python cifar10.py. - Extra credit: Improve the model and/or training regime and re-run the script.
- Save your python environment into a
environment.ymlfor conda or arequirements.txtforpip. - Upload the .pt model file and your environment file to Gradescope. Enter your name for the leaderboard.
Install PyTorch
From the pytorch website, the install instructions are:
conda install pytorch::pytorch torchvision -c pytorchpip install torch torchvisionconda install pytorch torchvision cpuonly -c pytorchpip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpuPython Script
import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
# Define transformations for the dataset
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# Download and load the CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)
# Classes
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
# Define the CNN model
class SmallCNN(nn.Module):
def __init__(self):
super(SmallCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(32 * 8 * 8, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 32 * 8 * 8)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
# Initialize the model, loss function, and optimizer
model = SmallCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
# Training the model
def train_model(model, trainloader, criterion, optimizer, epochs=5):
for epoch in range(epochs): # Loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# Get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# Zero the parameter gradients
optimizer.zero_grad()
# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)
# Backward pass and optimize
loss.backward()
optimizer.step()
# Print statistics
running_loss += loss.item()
if i % 100 == 99: # Print every 100 mini-batches
print(f'[Epoch {epoch + 1}, Batch {i + 1}] loss: {running_loss / 100:.3f}')
running_loss = 0.0
print('Finished Training')
# Using the TorchScript method for model saving
# Important! Do not change the following 2 lines of code except for the model name
scripted_model = torch.jit.script(model)
torch.jit.save(scripted_model, 'cifar10-model.pt')
print('Model saved as cifar10-model.pt')
# Call the training function
train_model(model, trainloader, criterion, optimizer)
# Evaluation function to test the accuracy
def evaluate_model(model, testloader):
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
accuracy = 100 * correct / total
print(f'Accuracy of the network on the 10,000 test images: {accuracy:.2f}%')
return accuracy
# Call the evaluation function
evaluate_model(model, testloader)Recap
Recap
In this module, we’ve covered:
- Creating and managing python environments with
condaandvenv. - Installing packages with
condaandpip. - Creating and managing
environment.ymlandrequirements.txtfiles. - Training and evaluating a simple CNN on the CIFAR-10 dataset.