PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821755
1983964
19841074
19851286
1986894
198711105
198823128
198940168
199040208
199149257
199259316
1993180496
1994350846
19952751,121
19963291,450
19974581,908
19985932,501
19997103,211
20008214,032
20018764,908
20029315,839
200313167,155
200418188,973
2005199510,968
2006223113,199
2007245815,657
2008228417,941
2009230420,245
2010229822,543
2011205624,599
2012220226,801
2013233229,133
2014282931,962
2015229234,254
2016257036,824
2017267339,497
2018260942,106
2019279244,898
2020345148,349
2021279451,143
2022354554,688
2023334858,036
2024347261,508
2025393265,440
202640265,842