When working on Python projects, it’s often necessary to identify and list all external dependencies to ensure your code runs smoothly across different environments. There are two primary methods to achieve this: using pipreqs and writing custom Python functions. Both methods require manual intervention to fix any discrepancies. Here’s a detailed guide on how to accomplish this.

Method 1: Using pipreqs

What is pipreqs?

pipreqs is a tool that generates a requirements.txt file for your project by scanning your code for import statements. It’s a convenient way to list all the packages your project depends on.

Steps to Use pipreqs

  1. Install pipreqs:
   pip install pipreqs
  1. Generate requirements.txt:
    Navigate to your project’s root directory and run:
   pipreqs . --mode no-pin

The --mode no-pin option ensures that the versions of the packages are not pinned in the generated requirements.txt file.

Manual Fixes

While pipreqs does a good job, it’s not perfect. You’ll need to manually review the generated. requirements.txt to fix any issues. Common problems include:

  • Incorrect package names: Some packages have different names when installed via pip compared to their import names in the code. For example, azure-cognitiveservices-speech is imported as azure.cognitiveservices.speech in the source code, and pipreqs cannot restore the correct package name.
  • Missing packages: pipreqs might miss some packages, especially if they are conditionally imported or dynamically loaded.

Method 2: Writing Custom Python Functions

For most of us, using pipreqs is very efficient and adequate. But it is also fun to try to explore how we can parse the source code on our own and identify imported libraries. Below is a simple implementation to get you started.

Python Script to Identify Imports

import os
import sys
import ast


def list_dirs(path='.') -> set:
    '''List all directory names in the given path and its subdirectories'''
    all_dirs = set()
    for _, dirs, _ in os.walk(path):
        # exclude hidden directories
        dirs[:] = [d for d in dirs if \
                   (not d.startswith('.') and d != '__pycache__')]
        for directory in dirs:
            all_dirs.add(directory)
    return all_dirs


def find_extern_imports(path) -> set:
    ''' Get all imported libs in a set '''
    imports = set()
    pyfiles = set()
    for root, _, files in os.walk(path):
        for file in files:
            if file.endswith(".py"):
                # add pyfile name to the set
                pyfiles.add(file[:-3])
                with open(os.path.join(root, file), "r") as f:
                    # ast is abstract syntax tree
                    # https://dev.to/balapriya/abstract-syntax-tree-ast-explained-in-plain-english-1h38
                    # you may not need to fully understand
                    # each line of the following code
                    # as long as you know what it tries to achieve
                    node = ast.parse(f.read(), filename=file)
                    for n in node.body:
                        if isinstance(n, ast.Import):
                            for alias in n.names:
                                imports.add(alias.name.split('.')[0])
                        elif isinstance(n, ast.ImportFrom):
                            imports.add(n.module.split('.')[0])

    all_dirs = list_dirs(path)
    # remove pyfiles and dirs from imports
    imports = imports - pyfiles - all_dirs
    return imports


def get_syslibs() -> set:
    '''get all builtin and standard library modules'''

    # sys.builtin_module_names return a tuple
    builtins = set(sys.builtin_module_names)

    # sys.stdlib_module_names return a frozenset
    stdlibs = set(sys.stdlib_module_names)

    syslibs = stdlibs.union(builtins)
    return syslibs


def get_pip_libs() -> set:
    ''' get all pip installed libs '''
    syslibs = get_syslibs()
    imports = find_extern_imports('.')
    piplibs = imports - syslibs
    return piplibs


def write_requirements(piplibs: set, filename: str):
    ''' write pip libs to requirements.in '''
    with open(filename, 'w') as f:
        for lib in sorted(piplibs):
            f.write(f"{lib}\n")


def main():
    ''' test cli '''
    piplibs = get_pip_libs()
    write_requirements(piplibs, 'requirements.in')
    print(f"The following packages were written to requirements.in: {piplibs}")


if __name__ == "__main__":
    main()

How It Works

  1. find_extern_imports(path): This function traverses your project directory, identifies all Python files, and extracts imported libraries. It also tries to exclude imports of other project files.
  2. get_syslibs(): This function returns a set of all built-in and standard library modules. Python built-in libraries are bundled with Python interpreter and are usually written with C. The standard libraries come with Python standard distributions. Both of those libraries do not need to be installed by pip.
  3. get_pip_libs(): This function subtracts the set of system libraries from imported ones to identify external dependencies.
  4. main(): This function prints the identified pip libraries.

Manual Fixes

Just like with pipreqs, the output from this script requires manual adjustments. Our own script always assumes the pip package names are the same as the import statements, and pipreqs at least tries to find the correct package names although it can make mistakes.

  • Correct package names: Ensure the names match the ones used in pip install.
  • Complete package list: Verify that all required packages are listed, especially if some are conditionally imported.

Conclusion

Both methods provide a good starting point for identifying external dependencies in your Python project. pipreqs offers a quick solution but may need manual corrections. Writing custom Python functions gives you more control and understanding of the process but also requires manual adjustments. By combining these methods, you can efficiently manage and list your project’s dependencies.

Remember, regardless of the method you choose, always review and test the generated requirements.txt file to ensure it accurately reflects your project’s needs.

Leave a Reply

Your email address will not be published. Required fields are marked *