### General information Authors: Malin Lüking, Yaakov Levy, Johan Elf Contact: malin.luking@icm.uu.se DOI: 10.17044/scilifelab.21590394 License: CC BY 4.0 This readme file was last updated: 01-12-2022 Please cite as: SciLifeLab Data Centre (2021). Dataset on coarse-grained simulations of the lac repressor in different conformations during diffusion and recognition. https://doi.org/10.17044/scilifelab.21590394 ### Dataset description The content of the database can be split into Starting structures, processed data, data for visualization in 3D space (to be used in e.g. pymol) and code. The Starting structures contain .pdb files with all-atom models and .dat files with coarse-grained models. The processed data contains the position of the center of mass of the proteins recognition region relative to the DNA. The data is split into the different systems we studied: the full-length proteins, dimers and monomers of the search and recognition conformations as well as encounter complexes with A- and B-forms DNA. All these systems have been studied at different salt concentrations. The code CG-analysis-rackham contains the code that was used for plotting the data for the figure in the publication as it was downloaded from github on November 22 2022. This code contains jupyter notebooks that analyse the processed data and produce the figures in the publication. It also contains pipeline_trajectory_analysis which produces the processed data from the trajectories. The processed data contains the position of the protein relative to the DNA (position along and around the DNA and distance from the DNA), which can be obtained from the trajectory using the Spiral package contained in the pipeline_trajectory_analysis folder and the Ex_spiral1.py script of CG_analysis-rackham. The preprosessed trajetcory data can the be plotted with the notebook plotting_CG_sim.ipynb (Figure 2 of the paper). The diffusion can be analysed and plotted with msd_diffusion_coefficient.ipynb (Figure 3 of the paper). The trajectory data can also be split into 1D and 3D diffusion and into groove tracking/sliding motions on the DNA with analysis_sliding_and_hopping.ipynb (Figure 4 of the paper). Interaction profiles of the protein on DNA can be plotted using interaction_profiles.ipynb (Fig. 5A). Finally different energies obtained from the simulation and bonds formed between protein and DNA of different conformations can be analysed using the script Ex_Bind_Occ.py and CG_energies_analysis.ipynb (Fig. 5 C and D). Each zip archive contains a README with further descriptions of the subfolder structure and the files contained within. The same goes for the code.