PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761111
1977920
1978323
1979225
1980328
1981735
19821449
1983756
19841167
19851077
1986885
1987893
198818111
198928139
199033172
199142214
199252266
1993142408
1994296704
1995222926
19962651,191
19973801,571
19984622,033
19995582,591
20006363,227
20016793,906
20026914,597
20039705,567
200413796,946
200514608,406
2006159610,002
2007169311,695
2008155813,253
2009148114,734
2010142116,155
2011125117,406
2012135118,757
2013144320,200
2014167421,874
2015138923,263
2016159624,859
2017163826,497
2018163128,128
2019166629,794
2020206431,858
2021163533,493
2022204335,536
2023199937,535
2024202239,557
2025224441,801
202625942,060