PST: parallel simulation tool to study open search methods for the identification of peptides with post-translational modifications

About PST

Tandem mass spectrometry is now the method of choice in large-scale proteomic study. Nowadays, there are three types of method for peptide identification based on the mass spectrometry data, i.e., database search, de novo sequencing and library search. Database search is predominant for its efficiency and robustness. After decades' development, more and more spectra with post-translational modifications can be identified using open search methods, such as spectrum-based open search [1] and tag-based open search [2].

However, the identification results are not consistent using these methods. This phenomenon motivates us to study the relationship between open search result and spectrum quality. We intend to draw a limit of the open search under different spectrum quality. To get a comprehensive understanding, we develop this parallel simulation tool to generate diverse spectra with different conditions. Based on the analytical model, we can finally draw a mapping relationship between open search result and spectrum quality.


Related Publication

J. Dai*, F. Yu*, C. Zhou, and W. Yu. *Contributed equally to this work.
"Understanding the limit of open search in the identification of peptides with post-translational modifications — A simulation-based study",
IEEE/ACM Trans. Computational Biology and Bioinformatics, in press, 2020.
Part of the preliminary result has appeared as a poster in The 21st Annual International Conference on Research in Computational Molecular Biology (RECOMB 2017).

[1] J. M. Chick, D. Kolippakkam, D. P. Nusinow, B. Zhai, R. Rad, E. L. Huttlin, and S. P. Gygi, “A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides,” Nature Biotechnology, vol. 33, no. 7, pp. 743– 749, 2015.
[2] F. Yu, N. Li, andW. Yu, “PIPI: PTM-invariant peptide identification using coding method,” Journal of Proteome Research, vol. 15, no. 12, pp. 4423–4435, 2016.


Where to download PST

Source code (updated 22 Mar 2018): PST_src.zip
(Downloads = )

Data plotting (updated 18 Dec 2019): REALDATA_Process.zip
(Downloads = )


Environment configuration

Please install Python 3 and NumPy to use this tool.


How to use it

Please change the directory to the root of the package before starting simulations.
To start the simulations using tag-based open search, please use the following commands.

python3 scripts/run.py config/sample.json
python3 scripts/local_control.py output/Sample.Output.dat

To start the simulations using spectrum-based open search, please use the following commands.

python3 scripts/run_spec.py config/sample.spec.json
python3 scripts/local_control.py output/Sample.Spectrum.Output.dat

To view the content of the result file (.dat), please use

python3 scripts/echo.py output/Sample.Output.dat