filepattern.FilePattern
- class FilePattern(path, pattern='', block_size='', recursive=False, suppress_warnings=False)
Bases:
PatternObject
Class to create a FilePattern object.
This class take in in 4 arguments: path, pattern, block_size, and recursive. For the path, either a path to a directory, text file, or stitching vector may be provided.
filepattern
will then match the filenames in the directory, or each line of the text file, to the providedpattern
.The
block_size
parameter allows for out of core processing, which consumeblock_size
amount of memory at most.The
recursive
parameter enables recursive iteration of subdirectories when a directory is passed aspath
. In this casefilepattern
will iterate over the subdirectories, storing filenames with the same basename in the same group.- Parameters
path (
str
) – Path to directory or text filepattern (
str
) – Pattern to compare each filename toblock_size (
str
) – Maximum amount of RAM to consume at once. Defaults to “”.recursive (
bool
) – Iterate over subdirectories. Defaults to False.
- __init__(path, pattern='', block_size='', recursive=False, suppress_warnings=False)
Constructor of the FilePattern class. The path argument can either be a directory, a text file, or a stitching vector. Passing in the optional argument block_size will create an ExternalFilePattern object, which will process the directory in blocks which consume less than or equal to block_size of memory.
Just the path may be passed in the pattern is contained within the path. In this case, the names of the subdirectories are captured if they are named is the same manner as the pattern. For example, if just the path ‘path/to/files/{channel: c+}/img_r{r:d+}_c{c:d+}.tif’ is passed, the names of the channel subdirectories will be captured for each file.
- Parameters
path (
str
) – Path to directory or text filepattern (
str
) – Pattern to compare each filename toblock_size (
str
) – Maximum amount of RAM to consume at once. Defaults to “”.recursive (
bool
) – Iterate over subdirectories. Defaults to False.suppress_warnings – True to suppress warning printed to console. Defaults to False.
Methods
__init__
(path[, pattern, block_size, ...])Constructor of the FilePattern class.
get_matching
(**kwargs)Get all filenames matching specific values
get_occurrences
(**kwargs)Takes in a variable as the key and a list of values as the value and returns the a dictionary mapping the variable to a dictionary of the values mapped to the number of occurrences of the variable value.
get_unique_values
(*args)Given variable names from the filepattern as arguments, this method returns a dictionary of mapping the variable names to a set of the unique values for each variable.
Returns a list of variables that are contained in the filepattern
output_name
([files])Returns a single filename that captures variables from a list of files.
- __call__(group_by='', pydantic_output=False)
Iterate through files parsed using a filepattern
This method returns an iterable of filenames matched to the filepattern.
If a group_by variable is provided, lists of files where the variable is held constant are returned on each call.
Note that the group_by argument works in the inverse of the previous version of filepattern. The variable passed to group_by will be held constant rather than the other variables remaining constant.
- Parameters
group_by (
Union
[str
,List
[str
]]) – A string consisting of a single variable or a list of variables to group filenames by.pydantic_output (
bool
) – Get Pydantic models as the output
- Return type
Union
[List
[Tuple
[List
[Tuple
[str
,Union
[str
,int
,float
]]],List
[Tuple
[Dict
[str
,Union
[int
,float
,str
]],List
[PathLike
]]]]],Tuple
[Dict
[str
,Union
[int
,float
,str
]],List
[PathLike
]]]
- __getitem__(key)
Get slices of files that match the filepattern
Slices of files can be retrieved using [] operator. Files can be accessed using a single index such as fp[1] or slices of files, such as fp[:10], f[1:10], or fp[1:2:10].
- Parameters
key (int) – Index of file
- Return type
Union
[List
[Tuple
[Dict
[str
,Union
[int
,float
,str
]],List
[PathLike
]]],Tuple
[Dict
[str
,Union
[int
,float
,str
]],List
[PathLike
]]]- Returns
- Union[List[Tuple[Dict[str, Union[int, float, str]], List[os.PathLike]]],
Tuple[Dict[str, Union[int, float, str]], List[os.PathLike]]]: Returns single file for a single index or a List of files for a slice.
- get_matching(**kwargs)
Get all filenames matching specific values
This method will return a list containing all files where the variable matches the supplied. For example, if the argument x=1 is passed to get matching, all files where x is 1 will be returned. A list of values can also be passed, such as x=[1,2,3]. Furthermore, an arbitrary number of variables and values can be passed, such as x=1, y=2, z=3 or x=[1,2,3], y=[‘a’, ‘b’, ‘c’], z=[4, 5, 6].
Example
For a directory containing the files ```
img_r001_c001_DAPI.tif img_r002_c001_DAPI.tif img_r001_c001_TXREAD.tif img_r002_c001_TXREAD.tif img_r001_c001_GFP.tif img_r002_c001_GFP.tif
The get_matching method can be used as:
- ```
path = /path/to/directory
pattern = ‘img_r{r:ddd}_c{c:ddd}_{channel:c+}.tif’
files = fp.FilePattern(path, pattern)
matching = files.get_matching(channel=[‘TXREAD’])
the matching variable will be a list of matching files:
- ```
[({‘c’: 1, ‘channel’: ‘TXREAD’, ‘r’: 1}, [‘/home/ec2-user/Dev/FilePattern/data/example/img_r001_c001_TXREAD.tif’]), ({‘c’: 1, ‘channel’: ‘TXREAD’, ‘r’: 2}, [‘/home/ec2-user/Dev/FilePattern/data/example/img_r002_c001_TXREAD.tif’])]
- Parameters
**kwargs – One or more keyword arguments where the key is a variable contained in the filepattern and the value is a value for the variable. Use pydantic_output=True to get Pydantic models as the output.
- Return type
List
[Tuple
[Dict
[str
,Union
[int
,float
,str
]],List
[PathLike
]]]- Returns
List of matching files
- get_occurrences(**kwargs)
Takes in a variable as the key and a list of values as the value and returns the a dictionary mapping the variable to a dictionary of the values mapped to the number of occurrences of the variable value.
For example, if the filepattern is img_r{r:ddd}_c{r:ddd}.tif and r=1 occurs 20 times in the path, then the passing r=[1] will return {‘r’: {1: 20}}.
- Parameters
**kwargs – Each keyword argument must be a variable. If no arguments are supplied, the occurrences
returned. (for every variable will be) –
- Return type
Dict
[str
,Dict
[Union
[int
,float
,str
],int
]]- Returns
Dictionary of variables mapped to values where each value is mapped to the number of occurrences.
- get_unique_values(*args)
Given variable names from the filepattern as arguments, this method returns a dictionary of mapping the variable names to a set of the unique values for each variable. If no variables are provided, all variables will be returned.
For example if the filepattern is img_r{r:ddd}_c{r:ddd}.tif and r ranges from 1 to 3 and c ranges from 1 to 2, then fp_object.get_unique_values(‘r’, ‘c’) will return {‘r’: {1,2,3}, ‘c’: {1,2}}.
- Parameters
**args – Variables to get the occurrences of. All variables will be returned if no arguments are provided.
- Return type
Dict
[str
,Set
[Union
[int
,float
,str
]]]- Returns
Dictionary of variables mapped to values.
- get_variables()
Returns a list of variables that are contained in the filepattern
For example, if the filepattern is img_x{x:d}_y{y:d}_c{c:c+}.tif, get_variables will return the list [x, y, c].
- Return type
List
[str
]- Returns
List containing the variables in the filepattern
- output_name(files=[])
Returns a single filename that captures variables from a list of files.
Given a list of files, this method will return a single filename that captures the variables from each file in the list. If a variable is constant through the list, the variable value will be in the returned name. If a variable is not constant, the minimum and maximum values will appear in the returned name in the form “(min-max)”.
- Parameters
files (
list
) – List of files to get a single filename of.- Return type
str
- Returns
A string that captures the variable values from each file in files.