PIFiA dataset includes images from high-throughput screens of the yeast ORF-GFP collection combined with
single-cell analysis of subcellular localization patterns obtained using self-supervised deep neural network
PIFiA, as described in Razdaibiedina et al. 2023. PIFiA dataset contains images of 4049 strains expressing a
GFP-tagged protein visible above background fluorescence that were obtained using an automated confocal
microscope. Cell images were obtained from two biological replicates, each with four fields of view for each
GFP-tagged strain. All screens are included in TheCellVision.org together with single-cell micrographs, and
results from PIFiA downstream analyses for subcompartmental localization patterns.
On the main page of the PIFiA project on TheCellVision.org website, we provide an interactive t-SNE map with
4049 proteins – produced based on the features obtained across the whole proteome. Each point on this map
represents a protein and can be interactively selected. Upon selection, a tab with protein description and
PIFiA downstream analysis results appear. The t-SNE is colored according to 15 localization categories
defined by Huh et al., 2003 (left legend); protein complexes can be interactively selected to be highlighted
on the map (right legend).
Results about a certain protein (three tabs) appear after selecting this protein on a map or putting its
name in a search box:
- Description
- Standard name: Official or standardized name of the protein
- ORF: Open Reading Frame - the DNA sequence that potentially encodes a protein
- Aliases: Other names or identifiers used to refer to the protein
- Human Ortholog: Equivalent protein in humans, if known
- Description: Brief summary of the protein function according to “Saccharomyces Genome
Database” (https://www.yeastgenome.org/)
- Localization: Subcellular localization of the protein defined by PIFiA standard from
Razdaibiedina et al., 2024
- Localization Type: Type of homogeneity of subcellular localization (e.g., homogeneous;
mixed OR-type; mixed AND-type). Defined by the percentage of cells that exhibit localization
heterogeneity.
- Cell Percentages: Percentage of cells in which the protein is localized (reported for
predominant localization)
- Cell Cycle Cregulation: Any information about how the protein's localization or
function may vary during the cell cycle.
- Subcompartmental Group: Category indicating the specific sub-compartment inside the
organelle where the protein is localized (e.g. nucleus-5; cytoplasm-2) from Razdaibiedina et al., 2024.
- Images
- Protein name: Official or standardized name of the protein shown in GFP screen
- Replicate: replicate 1 or replicate 2 where images are taken from
- Analysis
- Nearest neighbours: Displays proteins that exhibit similar subcellular localization
patterns based on PIFiA feature profiles similarity to the queried protein
- Correlation threshold: Specifies the threshold for similarity correlation between
proteins, allowing users to adjust the stringency of the comparison
- Show neighbors on t-SNE: Visualizes similar proteins on the whole-proteome t-SNE to
provide a comprehensive understanding of their spatial arrangement within the cellular context
- Enrichment analyses based on Gene Ontology (GO): results of enrichment analyses to
uncover significant associations between proteins and biological processes, molecular
functions, and cellular components defined by Gene Ontology terms. Results are shown in
tables with the following columns:
- GO term: Identifier for the Gene Ontology term associated with the enrichment
analysis result
- Term name: Descriptive name of the Gene Ontology term
- Overlap: Number of genes from the nearest neighbours that overlap with the GO
term
- P-value: Statistical significance level indicating the likelihood of observing the
overlap by chance
- Adjusted p-value: P-value adjusted for multiple hypothesis testing to control for
false positives
- -log10 adjusted p-value: Negative logarithm of the adjusted p-value, providing a
more intuitive representation of significance
- Fold enrichment: Ratio of the observed overlap to the expected overlap, indicating
the enrichment of genes associated with the GO term
- Genes: List of genes from the nearest neighboursr that are associated with the GO
term, providing insight into the molecular components contributing to the enriched biological
processes, molecular functions, or cellular components