filepattern.FilePattern

class FilePattern(path, pattern='', block_size='', recursive=False, suppress_warnings=False)

Bases: PatternObject

Class to create a FilePattern object.

This class take in in 4 arguments: path, pattern, block_size, and recursive. For the path, either a path to a directory, text file, or stitching vector may be provided. filepattern will then match the filenames in the directory, or each line of the text file, to the provided pattern.

The block_size parameter allows for out of core processing, which consume block_size amount of memory at most.

The recursive parameter enables recursive iteration of subdirectories when a directory is passed as path. In this case filepattern will iterate over the subdirectories, storing filenames with the same basename in the same group.

Parameters
  • path (str) – Path to directory or text file

  • pattern (str) – Pattern to compare each filename to

  • block_size (str) – Maximum amount of RAM to consume at once. Defaults to “”.

  • recursive (bool) – Iterate over subdirectories. Defaults to False.

__init__(path, pattern='', block_size='', recursive=False, suppress_warnings=False)

Constructor of the FilePattern class. The path argument can either be a directory, a text file, or a stitching vector. Passing in the optional argument block_size will create an ExternalFilePattern object, which will process the directory in blocks which consume less than or equal to block_size of memory.

Just the path may be passed in the pattern is contained within the path. In this case, the names of the subdirectories are captured if they are named is the same manner as the pattern. For example, if just the path ‘path/to/files/{channel: c+}/img_r{r:d+}_c{c:d+}.tif’ is passed, the names of the channel subdirectories will be captured for each file.

Parameters
  • path (str) – Path to directory or text file

  • pattern (str) – Pattern to compare each filename to

  • block_size (str) – Maximum amount of RAM to consume at once. Defaults to “”.

  • recursive (bool) – Iterate over subdirectories. Defaults to False.

  • suppress_warnings – True to suppress warning printed to console. Defaults to False.

Methods

__init__(path[, pattern, block_size, ...])

Constructor of the FilePattern class.

get_matching(**kwargs)

Get all filenames matching specific values

get_occurrences(**kwargs)

Takes in a variable as the key and a list of values as the value and returns the a dictionary mapping the variable to a dictionary of the values mapped to the number of occurrences of the variable value.

get_unique_values(*args)

Given variable names from the filepattern as arguments, this method returns a dictionary of mapping the variable names to a set of the unique values for each variable.

get_variables()

Returns a list of variables that are contained in the filepattern

output_name([files])

Returns a single filename that captures variables from a list of files.

__call__(group_by='', pydantic_output=False)

Iterate through files parsed using a filepattern

This method returns an iterable of filenames matched to the filepattern.

If a group_by variable is provided, lists of files where the variable is held constant are returned on each call.

Note that the group_by argument works in the inverse of the previous version of filepattern. The variable passed to group_by will be held constant rather than the other variables remaining constant.

Parameters
  • group_by (Union[str, List[str]]) – A string consisting of a single variable or a list of variables to group filenames by.

  • pydantic_output (bool) – Get Pydantic models as the output

Return type

Union[List[Tuple[List[Tuple[str, Union[str, int, float]]], List[Tuple[Dict[str, Union[int, float, str]], List[PathLike]]]]], Tuple[Dict[str, Union[int, float, str]], List[PathLike]]]

__getitem__(key)

Get slices of files that match the filepattern

Slices of files can be retrieved using [] operator. Files can be accessed using a single index such as fp[1] or slices of files, such as fp[:10], f[1:10], or fp[1:2:10].

Parameters

key (int) – Index of file

Return type

Union[List[Tuple[Dict[str, Union[int, float, str]], List[PathLike]]], Tuple[Dict[str, Union[int, float, str]], List[PathLike]]]

Returns

Union[List[Tuple[Dict[str, Union[int, float, str]], List[os.PathLike]]],

Tuple[Dict[str, Union[int, float, str]], List[os.PathLike]]]: Returns single file for a single index or a List of files for a slice.

get_matching(**kwargs)

Get all filenames matching specific values

This method will return a list containing all files where the variable matches the supplied. For example, if the argument x=1 is passed to get matching, all files where x is 1 will be returned. A list of values can also be passed, such as x=[1,2,3]. Furthermore, an arbitrary number of variables and values can be passed, such as x=1, y=2, z=3 or x=[1,2,3], y=[‘a’, ‘b’, ‘c’], z=[4, 5, 6].

Example

For a directory containing the files ```

img_r001_c001_DAPI.tif img_r002_c001_DAPI.tif img_r001_c001_TXREAD.tif img_r002_c001_TXREAD.tif img_r001_c001_GFP.tif img_r002_c001_GFP.tif

```

The get_matching method can be used as:

```

path = /path/to/directory

pattern = ‘img_r{r:ddd}_c{c:ddd}_{channel:c+}.tif’

files = fp.FilePattern(path, pattern)

matching = files.get_matching(channel=[‘TXREAD’])

```

the matching variable will be a list of matching files:

```

[({‘c’: 1, ‘channel’: ‘TXREAD’, ‘r’: 1}, [‘/home/ec2-user/Dev/FilePattern/data/example/img_r001_c001_TXREAD.tif’]), ({‘c’: 1, ‘channel’: ‘TXREAD’, ‘r’: 2}, [‘/home/ec2-user/Dev/FilePattern/data/example/img_r002_c001_TXREAD.tif’])]

```

Parameters

**kwargs – One or more keyword arguments where the key is a variable contained in the filepattern and the value is a value for the variable. Use pydantic_output=True to get Pydantic models as the output.

Return type

List[Tuple[Dict[str, Union[int, float, str]], List[PathLike]]]

Returns

List of matching files

get_occurrences(**kwargs)

Takes in a variable as the key and a list of values as the value and returns the a dictionary mapping the variable to a dictionary of the values mapped to the number of occurrences of the variable value.

For example, if the filepattern is img_r{r:ddd}_c{r:ddd}.tif and r=1 occurs 20 times in the path, then the passing r=[1] will return {‘r’: {1: 20}}.

Parameters
  • **kwargs – Each keyword argument must be a variable. If no arguments are supplied, the occurrences

  • returned. (for every variable will be) –

Return type

Dict[str, Dict[Union[int, float, str], int]]

Returns

Dictionary of variables mapped to values where each value is mapped to the number of occurrences.

get_unique_values(*args)

Given variable names from the filepattern as arguments, this method returns a dictionary of mapping the variable names to a set of the unique values for each variable. If no variables are provided, all variables will be returned.

For example if the filepattern is img_r{r:ddd}_c{r:ddd}.tif and r ranges from 1 to 3 and c ranges from 1 to 2, then fp_object.get_unique_values(‘r’, ‘c’) will return {‘r’: {1,2,3}, ‘c’: {1,2}}.

Parameters

**args – Variables to get the occurrences of. All variables will be returned if no arguments are provided.

Return type

Dict[str, Set[Union[int, float, str]]]

Returns

Dictionary of variables mapped to values.

get_variables()

Returns a list of variables that are contained in the filepattern

For example, if the filepattern is img_x{x:d}_y{y:d}_c{c:c+}.tif, get_variables will return the list [x, y, c].

Return type

List[str]

Returns

List containing the variables in the filepattern

output_name(files=[])

Returns a single filename that captures variables from a list of files.

Given a list of files, this method will return a single filename that captures the variables from each file in the list. If a variable is constant through the list, the variable value will be in the returned name. If a variable is not constant, the minimum and maximum values will appear in the returned name in the form “(min-max)”.

Parameters

files (list) – List of files to get a single filename of.

Return type

str

Returns

A string that captures the variable values from each file in files.