Back Home
Back Home
Molecular Modeling and Bioinformatics Group

BIGNASim database structure and analysis portal for nucleic acids simulation data


Deposit new trajectories in BIGNASim

BIGNASim accepts submissions to incorporate new trajectory data into the repository. Submitted datasets should correspond ideally to finished studies, described by one or more journal publications. To ensure the consistency and high-quality data of the repository, deposited data shold meet a specific list of requirements, and authors should provide enough metadata to efficiently classify the study within the database.



Submission requirements

  1. Datasets should correspond to Molecular Dynamics simulation trajectories of nucleic acids (DNA, RNA, PNA), alone or complexed with protein or small ligands.
  2. Datasets should be supported by a scientific publication. Publications in press or submitted could be acceptable, but data will be kept on-hold until the publication is publicly available, and eventually removed if this does not happens after a reasonable time. One or more datasets can be derived from a single publication, but all submitted data should be referred explicitly in the paper or Supplementary Material. In this sense, BIGNASim could be used as on-line Supp. Material for the publication, provide that enough processing time is allowed.
  3. Submitted trajectories will be limited to 5,000 frames representing the complete simulation time. Solvent should be removed but ions may remain when its presences is relevant to the study. Trajectories should be imaged and individual frames superimposed. Trajectory can be split in multiple files if necessary. Table S7 indicates the available formats.
  4. Trajectories should be accompanied by a topology file in the appropriate format that should match the contents of the trajectory files. PDB files are acceptable as topologies. BIGNASim uses MDAnalysis for handling trajectory and topology formats. Refer to Table S7 for additional format information.
  5. Submission should include a series of mandatory metadata items (Table S7), but we encourage to provide metadata covering the entire ontology as a complete annotation of the submitted data allows a better indexation of the same.
  6. Trajectories should fulfil the quality requirements shown in Table S7. Quality will be further checked for the BNSim team as part of the analysis step carried out after the submission step.

Table S7. Deposition requirements

TRAJECTORY AND TOPOLOGY FORMATS
Acceptable trajectory formats Preferred: DCD, CRD, XTC, PDB (Models), NetCDF, BINPOS
(Additional format information)
Acceptable topology formats Preferred: PDB, Amber TOP, Gromacs GRO, ITP, RTP, NAMD PSF
(Additional format information)
MINIMUM SET OF METADATA
Dataset Description
  • Description/aim of the study
  • References for supporting publication(s)
System Description
  • Reference experimental structure (PDB, NDB id)
  • Type of Nucleic Acid, Main architecture (S. strand, Duplex, etc.), RNA type
  • Composition (Naked NA, Complexes)
  • Relevant sequence modifications or features
  • Relevant local structures
Simulation conditions
  • Force Field (type and precise version)
  • Simulation length
  • Simulation temperature
  • Solvent and ions
  • Charge settings, added salt
  • Type of trajectory (equilibrium, folding/unfolding, transition)
  • Number of frames, Time per frame
Preliminary Analyses
  • RMSd, RMSd/bp, R. Gyration Variation, Lost WC HBonds, Lost 3D contacts
  • Presence of fraying, Global avg. Roll (degrees), Global avg. Twist (degrees), Groove dimensions (specify measurement method)
ORIENTATIVE QUALITY CHECKLIST (applicable to equilibrium trajectories)
Simulation length > 200 ns
RMSd * < 5Å
RMSd/bp * < 0.3Å/bp
R. Gyration < 0.4Å/bp
Lost of WC Bonds * < 20%
Lost of 3D contacts * < 30%
Maintenance of global fold > 90% simulation time

(*) RMSd: All-heavy atoms mass weighted. References should be the experimental structure when available.When not available refer to canonical fiber data. (&) Applicable to average values in duplex segments



Submission procedure

Before starting the submission procedure, authors should prepare the following material:

  • Trajectory:   One or several trajectory files. Check the requirements section to ensure that your data is prepared for BIGNASim deposition.
  • Topology:   A topology file matching the trajectory file(s). Check accepted formats in table S7.
  • Analyses:   The result for a set of simple preliminary analyses to validate the quality of the submitted data. Check the orientative quality checklist
  • Documentation:   A copy of relevant publications regarding the dataset, unless openly available
  1. Register in BIGNASim:
    Indicate in the registry the intention of depositing new datasets, and follow the instructions.

  2. Upload the necessary files:
    Trajectories, topology, publication files or any other file willing to be submitted should be first uploaded to the user workspace. HTTP upload is provided. Files and directories can be compressed (view manual) and a trajectory can be splitted into multiple files. However, if large datasets are submitted, please contact the BNS team as other procedures can be arranged to avoid transmission problems

  3. Deposition:
    From your workspace, click the button "Initiate Deposition" to open the deposition forms, and follow the instructions herein. In two separated steps, all fields listed at table S7 and others non-mandatory fields are requested.
    Firstly, the dataset and its associated publication should be defined, together with the uploaded files ready to be included into the deposit.
    Secondly, the bunch of metadata describing the uploaded files needs to be specified.
    Alternatively, a metadata file containing such fields can be directly included into the deposit, so that the second step can be omitted - although the given metadata can always be modified in this second step. In this way, the data can be uploaded or re-used from a previous submission. The metadata file is a CSV (comma-separated file) with a simple "tag ; value" format in each line. Accepted tags are:

    • BIGNASim ontology terms: a single identifier per line is enough, though the label can also be included on the second column for the sake of clarity
    • Specific tags: the complete list and their descriptions can be found here


    submissionID      ; BNS_pdb1ERE_0000
    datasetName       ; pdb1ERE
    datasetDescription; 1ERE thermal stability
    publDateSys       ; hold
    publDate          ; 2016/01/15
    pubSys            ; document
    PDB               ; 1ere
    counterions       ; Na+ (0.05)
    trajLength        ; 400
    frames            ; 4000
    frameStep         ; 0.1
    trajTemperature   ; 300
    10101             ; DNA
    102020101         ; Linear
    10301             ; Naked
    10402             ; B
    201010104         ; ParmBSC0-OL1
    2010201           ; NanoSecondRange
    2010301           ; PhysiologicalTemp
    2010401           ; Water
    2010501           ; Electroneutral
    2010601           ; Dang
    2010701           ; TIP3P
    20202             ; Folding
    30101             ; Snapshot
    rmsd              ; 4.1
    rmsd_bp           ; 0.14
    Rgyr              ; 0.15
    lostWC            ; 9
    lostContacts      ; 16
    fraying           ; no
    avgTwist          ; 20
    minorGrooveSize   ; 6*8.2
    majorGrooveSize   ; 11.6*8.5
    

  4. Confirm Submission:
    Click on "Complete Submission" to finish the process. After completing the submission, an accession number will be issued, and a log file will be generated to report the progressive status of the petition. Submitted data will be blocked and maintained on-hold until it has been processed. To modify eventually the contents of the submission, contact our BNS team.

  5. Validation:
    Trajectories will be checked and complet collection of BIGNASim analyses will be performed. Again, a detailed log of the analysis performed will be available. The status file will be updated accordingly. In the eventual case that any of the analysis failed due to issues in the incoming data, authors will be informed and asked to amend the problem. In the absence of errors, the dataset will be incorporated to the database and made public immediately or at the indicated date if any. Note that analysis procedure may take some time