PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 95%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
19811046
19821864
19831175
19841186
19851298
19869107
198711118
198825143
198946189
199052241
199157298
199266364
1993232596
19944611,057
19953431,400
19964071,807
19975622,369
19987563,125
19998964,021
200010045,025
200110426,067
200211117,178
200315568,734
2004211810,852
2005234013,192
2006264215,834
2007296418,798
2008276021,558
2009281324,371
2010287327,244
2011263629,880
2012289132,771
2013309635,867
2014380139,668
2015314142,809
2016371846,527
2017406050,587
2018370354,290
2019403758,327
2020503563,362
2021449967,861
2022554073,401
2023515978,560
2024543383,993
2025625390,246
202675390,999