Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Raw DataNewly acquired raw data is stored on disk and copies to institutional storage in a timely fashion.

The Fermilab E-1039/SpinQuest experiment is expecting to collect approximately 20 Tb of raw data between commissioning and the end of data acquisition in 2022. The raw data is subsequently processed and stored in a MySQL database. The MySQL database will be approximately twice the size of the raw data.  In addition, there is a substantial volume of simulated, Monte Carlo events produced and stored at collaborating institutions and universities.

The raw data consists of event records from the CODA data acquisition system. These records contain the digitized hit information from the various detectors elements, including, for example, drift times from tracking chambers, hodoscope hits, scaler values, etc. These data will be stored onsite at Fermilab in the experimental counting house on a RAID disk array. In addition, new data are copied daily to the Fermilab STKEN Enstore system (which is located in a separate building) for additional protection against data loss.

The raw data are then decoded and stored in a MySQL database. The decoding takes the information in the raw CODA data records and translates them into a more user-friendly format, for example, assigning specific wires numbers in tracking chambers to digitized drift time information or hodoscope numbers to hits. Further processing then occurs on these data to change the hits into reconstructed tracks and events that are also stored in the MySQL database. The MySQL database is also hosted on site in the SpinQuest counting house. For ease of access and data security, the MySQL database is mirrored off site at the University of Illinois on a RAID system, and possibly other collaboration sites in the future. The source code, related calibrations, alignment data, etc. needed to translate the raw data to the MySQL database is under Subversion (SVN) version control. A second copy of this information is also maintained at the University of Illinois. MySQL is an open source database system that is widely available and well supported.

It is the SpinQuest Collaboration’s policy that these raw data and processed MySQL data are available to collaboration members for use in collaboration-approved scientific studies and analyses. Completed analyses will be submitted for publication and shared with outside researchers. SpinQuest will maintain the ability to access these data for a minimum of 7 years after the completion of the experiment.

Contribution from Fermilab Scientific Computing Devision:

  • Provide appropriate networking at NM4 hall including WiFi in both the counting area and detector hall for commissioning, data transfers to mass storage, network access for users’ laptops, etc. Provide firewalls/bridges which Fermilab deems necessary to isolate the experiment’s network from the general Fermilab network.

  • Provide “General Computing” accounts for collaborators. Primary analysis and Monte Carlo computing will be done on LINUX-based PC’s provided by the collaboration.

  • Provide storage for 50 TB of raw data. The collaboration also plans to keep a second copy of the raw data on a separate disk system.

  • Support for 4 virtual machines.

  • Access to grid resources including Open Science Grid and Fermigrid

Processed Data: Processed data is initially stored on disk and migrated to institutional storage as required. The raw data from the SpinQuest detector are stored on disk, at a rate of about 0.5 05 TB/week, with information on the particles as they transverse the detector components as well as information on target polarization and target parameters. The processed data are also stored on disk for analysis by members of the SpinQuest research community to analyze. Processed data is in Data Stitch Tajima (DST) format which will be analyzed with a ROOT based reconstruction and analysis framework.

...

Log Books: Jefferson lab uses an electronic logbook system SpinQuest ECL with a database back-end. Calibration and Geometry databases: Running conditions, as well as the detector calibration constants and detector geometries are stored in a database at Fermilab Lab.Other databases: Other databases may be relevant to data management, for example the JInventory database tool that catalogs which electronic modules were in the online systems.

Analysis software source code and build systems: Data analysis software is developed within the CLAS SpinQuest reconstruction and analysis package. Contributions to the package are from several sources, lab staff and university faculty, Fermilab users, off-site lab collaborators and third parties. Locally written software source code and build files, along with contributions from collaborators are stored in a version management system, git. Third party software is managed by software maintainers under oversight of the Software Support CommitteeAnalysis Working Group. Source code repositories and managed third party packages are continually backed up by IT.

Documentation: Documentation is available online in the form of content either maintained by a content management system (CMS) such as a Wiki in Githut or Drupal Confluence pagers or as static web pages. This content is backed up by IT. Source code documentation is part of the software through Doxygen (C++) and Javadocs (Java)continually. Other documentation for the software is distributed via wiki pages, and consists of a combination of html and pdf files. Documentation LaTeX source files are stored in the source code repository under a subheading “docs”. Maintenance of the wiki is performed by a small hall-B group.

Quality Assurance: As stated in the lab data management plan document, the data management plan process is overseen by the Deputy Director for Science. Periodic reviews of data management will be made. Quality Assurance of the software is ultimately the responsibility of an Analysis Coordinator and a committee selected from the collaboration to review reconstruction software.

The Fermilab E-1039/SeaQuest SpinQuest experiment is expecting to collect approximately 5 20 Tb of raw data between commissioning and the end of data acquisition in 2022. The raw data is subsequently processed and stored in a MySQL database. The MySQL database will be approximately twice the size of the raw data, or 10 Tb. In  In addition, there is a substantial volume of simulated, Monte Carlo events . These events are stored directly in the MySQL databaseproduced and stored at collaborating institutions and universities.

The raw data consists of event records from the CODA data acquisition system. These records contain the digitized hit information from the various detectors elements, including, for example, drift times from tracking chambers, hodoscope hits, scaler values, etc. These data will be stored onsite at Fermilab in the experimental counting house on a RAID disk array. In addition, new data are copied daily to the Fermilab STKEN Enstore system (which is located in a separate building) for additional protection against data loss.

The raw data are then decoded and stored in a MySQL database. The decoding takes the information in the raw CODA data records and translates them into a more user-friendly format, for example, assigning specific wires numbers in tracking chambers to digitized drift time information or hodoscope numbers to hits. Further processing then occurs on these data to change the hits into reconstructed tracks and events that are also stored in the MySQL database. The MySQL database is also hosted on site in the SeaQuest SpinQuest counting house. For ease of access and data security, the MySQL database is mirrored off site at the Uni- versity University of Illinois on a RAID system, and possibly other collaboration sites in the future. The source code, related calibrations, alignment data, etc. needed to translate the raw data to the MySQL database is under Subversion (SVN) version control. A second copy of this information is also maintained at the University of Illinois. MySQL is an open source database system that is widely available and well supported.

It is the SeaQuest SpinQuest Collaboration’s policy that these raw data and processed MySQL data are available to collaboration members for use in collaboration-approved scientific studies and analyses. Completed analyses will be submitted for publication and shared with outside researchers. SeaQuest SpinQuest will maintain the ability to access these data for a minimum of 7 years after the completion of the experiment.

...