RCSB PDB Help

Overview: Advanced Search


Video: Advanced Search and Grouping


Introduction

What is Advanced Search?

The Advanced Search tool allows users to build precise and complex queries. Unlike the Basic Search, which is broad and text-based, Advanced Search gives you fine-grained control over which data fields to search, the type of search to perform, and how results are combined.

Why use Advanced Search?

Use Advanced Search when you need specific results that cannot be obtained with Basic Search, want to combine multiple criteria in a single query, or wish to use specialized scientific tools to search for sequence or 3D similarity as well as motifs across PDB entries or CSMs.

Key Features

  • Attribute-specific searches: Target specific annotations associated with macromolecular structures. This includes annotations defined in the mmCIF dictionary (e.g. Source Organism), external value-added annotations (e.g. Enzyme Classification, Gene Ontology), and ligand-specific information (e.g. Chemical Name).
  • Boolean operators: Combine multiple criteria using AND, OR, NOT for precise queries.
  • Hybrid searches: Combine text-based, sequence-based, and structure-based searches in a single query.

Documentation

Search Tools

Structure Attributes

Structure Attributes search allows finding biomolecular structures based on annotations using dedicated annotation fields (attributes). Unlike free-text searches, which scan all text fields, this search focuses on structured metadata curated in the PDB archive or CSM datasets. Learn more.

What You Can Search For

  • Experimental method – e.g., X-ray crystallography, NMR, cryo-EM.
  • Resolution – numerical filters for X-ray structures.
  • Quality metrics – e.g., R-free, R-work, Q-score, clashscore, outliers.
  • Polymer type and composition – protein, DNA, RNA, or hybrid complexes.
  • Molecular weight / chain length – number of residues, number of chains.
  • Ligand presence – structures containing specific small molecules.

How It Works

  • Users select the structural attribute(s) they are interested in.
  • The search uses curated fields specifically designed to capture this information, ensuring accurate results.
  • Multiple attributes can be combined using AND or OR logic to refine results.

Tips

  • Combine with Sequence and/or 3D Similarity searches to further narrow results.

Sequence Similarity

Sequence Search allows users to find biomolecular structures based on amino acid or nucleotide sequences. It can be used to find sequences with a defined level of similarity in PDB entries and/or CSMs. Learn more.

3D Similarity

3D Similarity search allows users to find chains or assemblies that share a similar 3D shape with your query structure. You can select a query from the PDB archive or available Computed Structure Models (CSM), or provide your own structure by uploading a file or entering a URL. Learn more.

Sequence Motif

Sequence Motif search allows users to find structures that contain specific sequence patterns (motifs). You can define motifs using several syntax options: basic wildcard characters, PROSITE-style patterns, or regex notation. Learn more.

3D Motif

3D Motif search allows users to find structures that contain a similar 3D arrangement of functionally important residues (e.g., catalytic or binding sites). Choose a predefined motif from the M-CSA (Mechanism and Catalytic Site Atlas), or provide your own by entering a PDB/CSM ID, uploading a coordinate file, or providing a URL. Learn more.

Search Results

The Search Results page allows you to view and explore results at different levels of the biomolecular hierarchy (learn more). You can switch between Structures, Macromolecules, Assemblies, and Ligands result views, depending on what aspect of the data you want to examine.

  • Structures – Displays individual PDB entries (experimental or integrative structures) or CSMs (Computed Structure Models). Each structure corresponds to a single deposition or predicted model.
  • Macromolecules – Shows the polymeric sequences present in those structures, such as proteins, DNA, or RNA matching your query.
  • Assemblies – Presents biologically relevant assemblies, which are the functional complexes formed by one or more macromolecules. Assemblies represent the stable or functional units that carry out biological activity.
  • Ligands – Lists non-polymeric chemical components found in structures, such as small molecules, cofactors, ions, or drugs.

Switching between these views helps you explore your results from different biological perspectives—whether you are interested in whole structures, individual sequences, functional complexes, or chemical components.

Note: Some search criteria apply to specific levels of the biomolecular hierarchy. For example, organism name is defined at the macromolecule (sequence) level rather than the overall structure level. Because a structure may contain multiple macromolecules from different organisms, a Structures view search for an organism will return any structure that contains at least one macromolecule from that organism.

If you need to see exactly which polymeric entities match your search criteria—such as all sequences from a specific organism—you should switch to the Macromolecules view. This allows you to focus directly on the individual protein or nucleic acid sequences that satisfy your query, rather than structures in which they appear.

Grouping Results

Redundancy can occur at multiple levels in PDB data—for example, many structures may share similar sequences or nearly identical 3D conformations. To help users explore results without repeated or highly similar entries, interface provides several grouping options that generate a non-redundant view of search results. These grouping methods can be selected from the “Group results by” dropdown on top of your search results. Different grouping strategies (such as sequence identity thresholds) allow you to tailor how results are consolidated.

For more details, see Grouping Search Results.

Refinements

The Refinements panel on the left side of the Search Results page allows you to further narrow your results by applying additional filters. This panel works like a drill-down: you can start with a broad result set and then select filters to focus on structures, macromolecules, assemblies, or ligands that meet more specific criteria. Refinements are organized into predefined categories and display the most frequently occurring values or data ranges for quick selection—for example, experimental method, resolution, release date, organism, polymer type, and other commonly used attributes. Applying refinements helps you explore large result sets efficiently and identify the most relevant entries.

Sorting Results

The Sort by menu allows you to control the order in which search results are displayed. By default, results are sorted using relevancy scoring, which ranks entries based on how well they match your search terms. You may choose alternative sorting options—such as release date—from the pull-down menu above the results list.

Other search tools offered by the RCSB PDB (sequence similarity, , 3D similarity, sequence motifs, and 3D motifs) use their own specialized scoring methods. When a search includes multiple search types (for example, text + sequence similarity), results are sorted using a combined strategy by default. However, you may also choose to prioritize one search type—such as emphasizing the sequence similarity score over text relevance.

For details, see Sorting Strategies.

Results View Options

You can choose how search hits are displayed using the Summary, Gallery, and Compact view options

  • Summary view shows each search hit with both an image and key summary information, providing a balanced overview.
  • Gallery view displays results as images only, allowing you to visually scan large sets of structures, macromolecules, assemblies, or ligands.
  • Compact view presents results using text-only summary information, making it easy to browse more items at once without images.

Tabular Reports

Tabular Report options present search results in a multi-column, multi-row table, where each row corresponds to a single search hit. This format is useful for comparing attributes across many entries at once. The interface provides several preconfigured reports that highlight commonly reviewed data (such as experimental details, sequence information, or ligand composition). Users may also create custom tabular reports by selecting the specific data fields they want to include, allowing for flexible analysis and export of tailored result sets.

Context-Specific Results

The results you see depend on both the user’s query and the selected level of the biomolecular hierarchy: Structures, Macromolecules, Assemblies, Ligands.

For example, when running a Sequence Similarity search at the Macromolecules level, the results include pairwise sequence alignments between your query sequence and each matched entity. Each alignment is accompanied by key metrics such as sequence identity, E-value, and alignment coverage (see Figure 1).

Figure 1: Search result when the polymer return type is selected, showing sequence alignment and related details.
Figure 1: Search result when the polymer return type is selected, showing sequence alignment and related details.

Download Files

The Download Files button, located in the top-right corner of the Search Results page, allows you to export files for either all search results or a manually selected subset.



Please report any encountered broken links to info@rcsb.org
Last updated: 12/9/2025