SpinQuest Data Management

Summary

This SpinQuest data management plan details the collaborations plan to responsibly manage the scientific data recorded by the SpinQuest experiment. This document sets out the plan of experimental facility at Fermilab NM4 and is intended as a reference for the plans of the upcoming experiment using SpinQuest target and detectors. The Collaboration Chair and Spokespersons Dustin Keller manages collaboration membership with Spokesperson Kun Liu. The Fermilab liaison manages safety and experimental hall activities. Fermilab badge and ID manages computing accounts. The collaboration is responsible for the software utilities used for reconstruction, calibration and monitoring and all major aspects of event reconstruction.

Responsibilities

With the assistance of the SpinQuest Collaboration is responsible for the data management at the NM4 facility including all target, spectrometer and physics data. The maintenance of this document, the plan that it describes and its implementation are the responsibility of the Software Management team of SpinQuest formed by project leadership. This team is made up of the University of Virginia and Los Alamos National Labs as well as additional institutions that volunteer to take ongoing roles in this regard.

Data Management processes

The data management processes are listed as follows according to the broad categories of data that they address:

Raw Data: Newly acquired raw data is stored on disk and copies to institutional storage in a timely fashion.

Processed Data: Processed data is initially stored on disk and migrated to institutional storage as required. The raw data from the SpinQuest detector are stored on disk, at a rate of about 0.5 TB/week, with information on the particles as they transverse the detector components as well as information on target polarization and target parameters. The processed data are also stored on disk for analysis by members of the SpinQuest research community to analyze. Processed data is in Data Stitch Tajima (DST) format which will be analyzed with a ROOT based reconstruction and analysis framework.

Run Conditions: Run conditions (machine energy, beam intensity, target polarization, etc.) are stored in the experiment logbook and in database called

Databases: Database servers are managed by IT and regular snapshots of the database content are stored along with the tools and documentation required for their recovery.

Log Books: Jefferson lab uses an electronic logbook system SpinQuest ECL with a database back-end. Calibration and Geometry databases: Running conditions, as well as the detector calibration constants and detector geometries are stored in a database at Fermilab Lab.

Other databases: Other databases may be relevant to data management, for example the JInventory database tool that catalogs which electronic modules were in the online systems.

Analysis software source code and build systems: Data analysis software is developed within the CLAS reconstruction and analysis package. Contributions to the package are from several sources, lab staff and users, off-site lab collaborators and third parties. Locally written software source code and build files, along with contributions from collaborators are stored in a version management system, git. Third party software is managed by software maintainers under oversight of the Software Support Committee. Source code repositories and managed third party packages are backed up by IT.

Documentation: Documentation is available online in the form of content either maintained by a content management system (CMS) such as a Wiki or Drupal or as static web pages. This content is backed up by IT. Source code documentation is part of the software through Doxygen (C++) and Javadocs (Java). Other documentation for the software is distributed via wiki pages, and consists of a combination of html and pdf files. Documentation LaTeX source files are stored in the source code repository under a subheading “docs”. Maintenance of the wiki is performed by a small hall-B group.

Quality Assurance: As stated in the lab data management plan document, the data management plan process is overseen by the Deputy Director for Science. Periodic reviews of data management will be made. Quality Assurance of the software is ultimately the responsibility of an Analysis Coordinator and a committee selected from the collaboration to review reconstruction software.

The Fermilab E-1039/SeaQuest experiment is expecting to collect approximately 5 Tb of raw data between commissioning and the end of data acquisition in 2022. The raw data is subsequently processed and stored in a MySQL database. The MySQL database will be approximately twice the size of the raw data, or 10 Tb. In addition, there is a substantial volume of simulated, Monte Carlo events. These events are stored directly in the MySQL database.

The raw data consists of event records from the CODA data acquisition system. These records contain the digitized hit information from the various detectors elements, including, for example, drift times from tracking chambers, hodoscope hits, scaler values, etc. These data will be stored onsite at Fermilab in the experimental counting house on a RAID disk array. In addition, new data are copied daily to the Fermilab STKEN Enstore system (which is located in a separate building) for additional protection against data loss.

The raw data are then decoded and stored in a MySQL database. The decoding takes the information in the raw CODA data records and translates them into a more user-friendly format, for example, assigning specific wires numbers in tracking chambers to digitized drift time information or hodoscope numbers to hits. Further processing then occurs on these data to change the hits into reconstructed tracks and events that are also stored in the MySQL database. The MySQL database is also hosted on site in the SeaQuest counting house. For ease of access and data security, the MySQL database is mirrored off site at the Uni- versity of Illinois on a RAID system, and possibly other collaboration sites in the future. The source code, related calibrations, alignment data, etc. needed to translate the raw data to the MySQL database is under Subversion (SVN) version control. A second copy of this information is also maintained at the University of Illinois. MySQL is an open source database system that is widely available and well supported.

It is the SeaQuest Collaboration’s policy that these raw data and processed MySQL data are available to collaboration members for use in collaboration-approved scientific studies and analyses. Completed analyses will be submitted for publication and shared with outside researchers. SeaQuest will maintain the ability to access these data for a minimum of 7 years after the completion of the experiment.

Contribution from Fermilab Scientific Computing Devision:

Provide appropriate networking at NM4 hall including WiFi in both the counting area and detector hall for commissioning, data transfers to mass storage, network access for users’ laptops, etc. Provide firewalls/bridges which Fermilab deems necessary to isolate the experiment’s network from the general Fermilab network.
Provide “General Computing” accounts for collaborators. Primary analysis and Monte Carlo computing will be done on LINUX-based PC’s provided by the collaboration.
Provide storage for 50 TB of raw data. The collaboration also plans to keep a second copy of the raw data on a separate disk system.
Support for 4 virtual machines.
Access to grid resources including Open Science Grid and Fermigrid

Page tree

SpinQuest Data Management

Summary

Responsibilities

Data Management processes