Python Virtual Environments and Package Management

Introduction

The next useful topic that you will need to know is how to manage Python virtual environments and manage package dependencies.

Important

We also include an exercise at the end of this document. Read through and complete the exercise.

We suggest that you follow along in a command shell on your local machine.

We’re going to build upon our Command Shells skills to install Python and manage virtual environments.

We also assume that you have a basic understanding of Python. If you need a tutorial, check out the Python Tutorial.

The great thing about Python is that it is a very popular language for scientific computing and so there are a lot of great libraries and tools that you can use.

This can also be a challenge because every python project will have different package dependencies, and even dependencies on specific versions of packages.

The solution to this problem is to use virtual environments.

There are basically two categories of environments and package managers:

  1. conda
  2. venv and pip

Conda

First a little clarification is in order since you’ll hear the terms “conda”, “miniconda”, and “anaconda” thrown around.

Conda is an open source package, dependency, and environment management system that can manage packages for multiple languages but is primarily used for Python.

Anaconda is a distribution of conda that includes a lot of packages bundled together. It is maintained by Anaconda, Inc. We don’t recommend using it because it is a large download and includes a lot of packages that you may not need.

Miniconda is a minimal installation of conda, maintained by Anaconda, Inc. With miniconda you can install only the packages you need.

Anaconda maintians a repository of packages that you can install with the conda install command. It sometimes happens that a package is not available in the conda repository, but is available in the PyPI repository which we discuss below.

To recap, conda is a tool for both package and environment management and we recommend using miniconda to tailor the installation to your needs.

venv and pip

The venv module is a standard library that is included with Python that supports creating python virtual environments.

pip is a package manager that is usually included with a Python installation. It is the recommended way by the Python Packaging Authority to install Python packages. It installs packages from the Python Package Index (PyPI) which tends to be a more comprehensive repository of Python packages than the conda repository.

Python Versioning and Upgrading

pip vs python3 -m pip

A word of caution here when you have multiple versions of Python installed. The commands python, python3 and pip may be associated with different versions of python.

A good habit is to check locations and versions of these commands with:

which python
python --version

which python3
python3 --version

which pip
pip --version

On MacOS, I get

% which python
python not found

% which python3
/Library/Frameworks/Python.framework/Versions/3.12/bin/python3
% python3 --version
Python 3.12.4

% which pip
/Library/Frameworks/Python.framework/Versions/3.12/bin/pip
% pip --version
pip 24.2 from /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip (python 3.12)

% which pip3
/Library/Frameworks/Python.framework/Versions/3.12/bin/pip3
% pip3 --version
pip 24.2 from /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pip (python 3.12)

It used to be on previous versions of MacOS it would come with Python 2.x as default as python, and pip would be the Python 2.x package manager. Now, you see that in the system installation, there is no python, and pip is actually an alias to pip3.

From the location and version shown above, we can see that pip, pip3, and python3 are all associated with the same version, in this case Python 3.12.4.

Note that in both conda and venv, both pip and python are aliases to the pip3 and python3.

pip is actually just a command line wrapper around the pip module, so a good precaution is to usepython3 -m pipto install packages. The commandpython3 -m pip` is always associated with the version of Python that you are currently using.

Installing and Upgrading Python

It’s very likely there is a newer version of Python available than is installed by default or was previously installed. The way you install Python depends on your operating system and command shell.

There are multiple ways to install.

Using python.org

Probably the easiest way is to download and install the MacOS install package from python.org downloads. This method allows you to install the latest version directly from the Python Software Foundation.

Caution

The installation from python.org will likely leave your old installation of Python and install another version in a different location.

Using Homebrew

Alternatively, you can use Homebrew to install and manage Python versions:

  1. Install Homebrew if you haven’t already:

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  2. Install Python:

    brew install python
  3. To update Python using Homebrew:

    brew update              # Update Homebrew itself
    brew upgrade python      # Upgrade Python to the latest version
  4. Verify the installed version:

    python3 --version
Note

Homebrew might not always have the absolute latest version available on python.org, but it provides an easy way to manage and update Python along with other packages.

On WSL Ubuntu, you can install Python from the package manager.

sudo apt update
sudo apt install python3

To see all available versions, you can use the apt list command.

apt list python3 -a

If there is a newer version available, you can upgrade Python by following these steps:

  1. Update your package list:

    sudo apt update
  2. Check the latest version available through apt:

    apt list python3 -a
  3. If a newer version is available, upgrade Python:

    sudo apt install --only-upgrade python3
  4. Verify the new version:

    python3 --version
  5. If you need a specific newer version not available in the default repositories, you can use a PPA:

    sudo add-apt-repository ppa:deadsnakes/ppa
    sudo apt update
    sudo apt install python3.x  # Replace x with the desired version number

Note: Using a PPA allows you to install versions of Python that may not be available in the standard Ubuntu repositories.

There are cases where you might end up with multiple versions of python installed simultaneously, which is fine, but just know that on both MacOS command shells and WSL Ubuntu, the first python executable found in the search path will be the one executed and the one shown from which python3.

Conda Installation

As mentioned above, we generally prefer the miniconda distribution of conda. You can find installation instructions at https://docs.anaconda.com/miniconda/.

It describes both the GUI and the command line installation methods.

Tip

Remember that for WSL Ubuntu, you’ll need use the command line installation method.

Conda Environment

Next step is to create a conda environment. See here for complete instructions.

conda create -n my_test_env

conda activate my_test_env

To see what packages are installed in the environment, you can use the conda list command.

conda list

What do you see in the list?

To deactivate the environment, you can use the conda deactivate command.

conda deactivate

To remove the environment, you can use:

conda remove -n my_test_env --all

Now let’s create a new environment with python 3.12.4 and see what packages are installed.

conda create -n my_test_env python=3.12.4

Now you can activate the environment and see what packages are installed.

conda activate my_test_env

conda list

How many packages do you see in the list?

Installing Packages

To install a package such as jupyter, you can use the conda install command.

conda install jupyter

You can also install a specific version of a package or major version of a package.

conda install jupyter=1.0.0

# install the latest minor version of the major version 1
conda install jupyter=1

To list the available versions of packages to install from conda, you can use the conda search command. Here’s how you can do it:

conda search package_name

For example, if you want to see all available versions of the jupyter package, you would run:

conda search jupyter

This will show you a list of all available versions of the package, along with information about which channel they’re from and which platforms they support.

If you want to see versions for a specific channel, you can use the --channel or -c option:

conda search --channel conda-forge jupyter

To see more details about a package, including its dependencies, you can use the --info flag:

conda search --info jupyter

If you want to see all packages that match a certain pattern, you can use wildcards:

conda search 'python=3.8*'

This would show all versions of Python 3.8.x available.

Remember, the available versions may depend on your current conda configuration, including which channels you have enabled. You might want to update your conda first to ensure you’re seeing the most recent information:

# make sure you are not in a conda environment
conda deactivate

conda update conda

These commands will help you find the specific versions of packages available for installation through conda.

Updating Packages

To check for updates and then update packages in a specific conda environment, you can follow these steps:

  1. First, activate the environment you want to check:
conda activate my_test_env
  1. To check for updates without actually installing them:
conda update --all --dry-run

This command will show you what packages would be updated without actually performing the update.

  1. If you’re satisfied with the proposed updates, you can perform the actual update:
conda update --all

This will update all packages in the current environment to their latest compatible versions.

Tip

You can also update specific packages by naming them:

conda update package1 package2
Note

Remember that conda will only update to versions that are compatible with other packages in your environment. Sometimes, major version updates might require manually specifying the new version or recreating the environment.

By regularly checking for and applying updates, you can ensure your environment has the latest features and security patches.

conda environment.yml

It’s often a good idea to make it easy for others to recreate your environment. You can do this by creating a environment.yml file.

conda env export > environment.yml

Then to create an environment from the environment.yml file, you can use the conda env create command.

conda env create -f environment.yml

Note that this will create a new environment with the same packages as the current environment and also the same name of the environment.

You can see which environments are available with the conda env list command.

Here’s a handy conda cheat sheet

venv and pip

Creating a Virtual Environment

To create a virtual environment using venv, you can use the following command:

python3 -m venv my_test_env

This creates a new virtual environment named my_test_env in the current directory.

Activating the Environment

To activate the environment:

source my_test_env/bin/activate
my_test_env\Scripts\activate

When activated, your command prompt should change to indicate the active environment.

Deactivating the Environment

To deactivate the environment:

deactivate

Installing Packages

To install a package such as jupyter, you can use the pip install command:

python -m pip install jupyter

You can also install a specific version of a package:

python -m pip install jupyter==1.0.0

# install the latest minor version of the major version 1
python -m pip install jupyter~=1.0.0

Listing Installed Packages

To see what packages are installed in the environment:

pip list

Checking for Package Updates

To check for updates to installed packages:

pip list --outdated

Updating Packages

To update a specific package:

python -m pip install --upgrade package_name

To update all packages:

pip list --outdated | cut -d ' ' -f1 | xargs -n1 pip install -U

Creating Requirements File

To create a requirements.txt file listing all installed packages:

pip freeze > requirements.txt

Installing from Requirements File

To install packages from a requirements.txt file:

python -m pip install -r requirements.txt

Removing the Environment

To remove the virtual environment, simply delete the environment folder:

rm -rf my_test_env
rmdir /s /q my_test_env

Remember to deactivate the environment before removing it.

Tip

Always use python -m pip instead of just pip to ensure you’re using the correct version of pip associated with your current Python environment.

A word on package caching

Note that pip and conda will cache packages, often in your home directory. This could be a problem because, for exeample, many versions of the same package can be cached and use up disk space. This can be especially problematic on SCC where you’re home directory is quite limited in size. We’ll talk about that when we cover SCC.

Exercise

Exercise Introduction

Let’s put what we’ve learned into practice. For this exercise, you’ll need to create a new virtual environment and install pytorch and related packages into the environment. You’ll save a snapshot of your environmenat as well. Then you’ll need to copy the python script from below into a file in your working directory. Run the script to train and evaluate the model. The script is a simple CNN for classifying CIFAR-10 images. It also writes out the trained model to a file. You’ll then need to upload your trained model file to Gradescope. You’ll also need to upload your environment.yml or requirements.txt file.

Exercise Instructions

  1. Create a directory for your exercise and cd into it.
  2. Create a new virtual environment with either venv or conda and then activate it.
  3. Install pytorch and related packages into the environment as described below.
  4. Copy the python script from below into a file in your working directory. The file should have extension .py such as cifar10.py.
  5. Run the script to train and evaluate the model. You could for example run it with python cifar10.py.
  6. Extra credit: Improve the model and/or training regime and re-run the script.
  7. Save your python environment into a environment.yml for conda or a requirements.txt for pip.
  8. Upload the .pt model file and your environment file to Gradescope. Enter your name for the leaderboard.

For extra credit, try to improve the accuracy of the model. The student who has the highest accuracy will receive 10 extra credit points and recognition in class!

Install PyTorch

From the pytorch website, the install instructions are:

conda install pytorch::pytorch torchvision -c pytorch
pip install torch torchvision
conda install pytorch torchvision cpuonly -c pytorch
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu

Python Script

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

# Define transformations for the dataset
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Download and load the CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)

# Classes
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# Define the CNN model
class SmallCNN(nn.Module):
    def __init__(self):
        super(SmallCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 32 * 8 * 8)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Initialize the model, loss function, and optimizer
model = SmallCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Training the model
def train_model(model, trainloader, criterion, optimizer, epochs=5):
    for epoch in range(epochs):  # Loop over the dataset multiple times
        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # Get the inputs; data is a list of [inputs, labels]
            inputs, labels = data

            # Zero the parameter gradients
            optimizer.zero_grad()

            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, labels)

            # Backward pass and optimize
            loss.backward()
            optimizer.step()

            # Print statistics
            running_loss += loss.item()
            if i % 100 == 99:    # Print every 100 mini-batches
                print(f'[Epoch {epoch + 1}, Batch {i + 1}] loss: {running_loss / 100:.3f}')
                running_loss = 0.0

    print('Finished Training')

    # Using the TorchScript method for model saving
    # Important! Do not change the following 2 lines of code except for the model name
    scripted_model = torch.jit.script(model)
    torch.jit.save(scripted_model, 'cifar10-model.pt')

    print('Model saved as cifar10-model.pt')

# Call the training function
train_model(model, trainloader, criterion, optimizer)

# Evaluation function to test the accuracy
def evaluate_model(model, testloader):
    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f'Accuracy of the network on the 10,000 test images: {accuracy:.2f}%')
    return accuracy

# Call the evaluation function
evaluate_model(model, testloader)

Recap

Recap

In this module, we’ve covered:

  • Creating and managing python environments with conda and venv.
  • Installing packages with conda and pip.
  • Creating and managing environment.yml and requirements.txt files.
  • Training and evaluating a simple CNN on the CIFAR-10 dataset.
Back to top