Streamline to Graduation

Complete the Qtracker note - We need to first complete the angular distribution analysis good enough so that we can finish this note with the real-world example of doing a full physics extraction with minimized error and a detail quantization of the remaining uncertainties. Have it reviewed by Kun and Kenichi and respond to all analysis or update requests and questions that they have.  We both must help to prepare this note for them to read and make it easy for them to get through this quickly.  I think what will make this go quick is to ensure this note is written in a thorough but compact way.  We should format this to NIM style like the basket article with line numbers and such to make such a review easy.  I expect we will submit this as a NIM article the same time we give it to Kun and Kenichi to review but there is no requirement for it to be published before you defend but it must have past Kun and Kenichi review before you defend.  I expect this review will take place on the order of 3 weeks assuming we prepared the write-up sufficiently well.

Here are some of those details again:
Starting from the CNN inference (and basic 3D chi2 fit) extract the true values and determine the best accuracy and precision achievable in each case using MC.  For this review we need not have the systematics of the analysis in the final state, but we still need to have written up the propagation of the new covariance (from MC to real data), physics background mitigation and resulting contamination effects, dump background, and any errors from partial track limits as a function of statistics.  The idea is to do our best to make all error contributions as small as possible to the observables (lambda, mu, nu) and rigorously quantify what is remaining.

Then introduce each one of the backgrounds one by one: physics background (j/psi, psi', single muons), or dump background (all channels), and contamination from partial tracks (full messy events).  You want to quantify the level of error on the extracted observable that each one of these adds, try to keep them all small but we need to understand what the major contributors are and how best to suppress.?Assuming this goes well, and you are able to get some reproduceable results of the extraction of lambda, mu, nu that are very close to the true.  Then you want to add in the covariance matrix.

You can do this by smearing each muon 4-vector by sampling from the multivariate normal distribution using full covariance matrix:

cov_matrix = np.array([[dpx^2, dpxpy, dpxpz, dpxE],
                                [dpxpy, dpy^2, dpypz, dpyE],
                                 [dpzpx, dpzpy, dpz^2, dpzE],
                                 [dEpx, dEpy, dEpz, dE^2]])

Where each of these terms are the functions of the input 4-vector components that you determined.

Then generate the smearing:
errors = np.random.multivariate_normal(mean, cov_matrix, num_samples)

original_4Vec = np.random.rand(num_samples, 4) 

new_4Vec = original_data + errors

Each one of these variance and covariance terms has an uncertainty associated with it (the error on the error so to speak).  You change these terms within the scale of that error (upper bound and lower bound) to quantify the contribution to the error in the observable from each component.

This is all done with MC.  Once we have ahold of this using MC where we can test how well we get the true values we then proceed with the experimental data.  For the experimental data my suggestion would be to impose whatever physics constraints make sense and iterate as many times as needed to get stable results. One constraint might be to make sure lambda<1 but we will have to look into this a bit.  The iteration part here means to do an initial extraction and get [lam,mu,nu] then use these results in your MC generator to produce MC so that you can use this MC to retune the model for the best extraction using those parameters.  After you have retuned than apply the new model to the experimental data again and get a better extraction, and so on.   

Software Repository and Documentation - At the time we give them the QTracker note to review we should have the software in a Github with detailed documentation on each part.  I think we may need to make two, one for your future needs and one for the group version.  I will help with the documentation on the group version.  We should get a version of this up soon and give it to others in our group to start playing around with before we let Kun and Kenichi see so maybe folks can help us make things clear.

Complete angular distribution writeup for PRD - We will have to submit a formal proposal for this analysis to be recognized through SeaQuest.  I will draft this soon, and you should go over it and add what is needed.  We write up in a formal Phys Rev article the full analysis with high-level details from the QTracker note but focus on the extraction of Lambda, mu, nu.  It is good if you can try to write most of this including the physics motivation.  I will certainly help as needed.   This paper should include the detailed error analysis and complete systematic studies.  A much more comprehensive write-up for the actual extraction than the QTracker note.  We will likely try to present both 1 bin averaged over kinematics as well as fine bins for future BM analysis, but this depends on our final statistics.  The SeaQuest review of this paper should be underway before the defense.   This review need not be complete by the time you defend but we need to have made this paper as detailed and clear as possible so such a review should be pretty straightforward. Again, there is no requirement to have this published by the time of the defense.  I will work to try to make that happen on my own after you graduate.  I'm sure I will need your help to update things in the paper but we can deal with that after the defense.

Update to the Thesis - The thesis is already in good shape, but you will have to update it will the actual physics extraction of lambda, mu, nu as well as any thing that we realized we needed to update from the QTracker part of the write up from Kun and Kenichis review.
The physics part of the thesis should be updated to reflect what is in PRD write up with all needed details.  They should ultimately be pretty similar.


So, for SeaQuest what we will need is a proposal (I'm doing this, but we will need to use the full SeaQuest data set), an analysis note for the extraction of the angular distribution (We are doing), and a draft paper (you are doing).  The analysis note will just be from the QTracker note (based on Kun's suggestions) and the PRD paper for the Physics.  The Qtracker note need not have the full details and conclusion of the angular distribution, but I've already mentioned that in the last email.  We just need preliminary results showing the scale of the error and how the error propagates. 

To be released as a preliminary result, the analysis would still need to go through a SeaQuest review, although the requirements would be slightly different. SeaQuest bylaws say that an analysis note can be a modified version of an analysis section of a dissertation. Here is the timeline that that needs to follow:

35 days before -- declaration of release topic & date
28 days before -- analysis note on DocDB
14 days before -- release presentation at SeaQuest analysis meeting.


It will be easier to get this together thinking the other way around.  We do what we need for our reviews then take what we need to update the thesis like I mentioned earlier.


If we aim for a defense date in late May or early June, that means that this process would need to start in late April or early May at the latest. That seems doable, as much of the work of the analysis is done. Donal indicated that he would want a draft of my dissertation about 4 weeks before my defense, which also aligns with this timeline, assuming the analysis note and that chapter of my dissertation are approximately the same thing.

Having the thesis ready 4 weeks before the defense is a pretty normal requirement.  Once we agree on what need to be done it will be easier to make a date for the defense.  Hard to do at the moment.  We need the agreement part.


Based on your comments, I feel that to get the QTracker note to a place where it is ready both to give to Kun and Kenichi, as well as in a format that we can submit as a paper, is not achievable before mid-to-late April. 


The first part of this should be achievable pretty quickly assuming we do what I mentioned and get some preliminary example of the extraction in there soon.  Organizing things to be in a publishable form should not get in the way of giving Kun and Kenichi something to get started on,


That would assume that I finish everything but the angular extraction in the note by the end of next week (March 29). I would then expect you to review it and we make changes within a week (by April 5). If the angular extraction analysis and write-up was finished during that week (which I think is a bit overambitious), and then you reviewed it, that would maybe allow us to be finished with the substance of the note by around April 12.

I don't think that the format that Kun and Kenichi need would work for an article, as they will expect more detail than is appropriate for a methods article. With changes and more edits, I can't see this all be done before April 19. Even with that time line and your 3-week estimate, that puts us getting the review back around May 10.

The timeframe is hard to estimate right now because we need a finished QTracker note to give to them.  Let's finish that and start this process.  


I feel that a simpler analysis would suffice for the QTracker note, which would allow us to accelerate this by quite a bit. I would suggest we use a mass curve fit, which would be simpler both for them to review and for us to finish quickly. I also think that we can give them a note first, then put it in the article format after. That would allow us to give a note to them probably in the first third of April, and get it back by late April or early May.

We definitely need the mass spectrum proof, that goes without saying.  The preliminary angular distribution analysis is about what I mentioned above demonstrating utility as well as error minimization and propagation.  Like I mentioned this need not be complete.  Even what we have now is nearly there.


During the time that they are reviewing the note, we could work on converting the QTracker note into a paper, finish the analysis and analysis note, and I can work on my dissertation. If we submit the analysis note for review and I have a mostly completed dissertation by early May, that does give a month with relatively to do, which could be used to make the angular dependence analysis into paper.


Definitely things should be done in parallel.  


So here is the timeline I suggest, mostly with Fridays as the benchmark for different steps to be done:

  • By March 29: Arthur completes QTracker note with testing and mass-curve fit as proof of concept.
  • By April 5: Dustin reviews the note, and we make any necessary changes for clarity. Give the note to Kenichi and Kun for review.
  • By April 19: Arthur completes the angular dependence analysis and analysis note. Arthur sends an email to the SeaQuest collaboration declaring our intention to release a preliminary result (must be at least 35 days before defense).
  • By April 26: Dustin reviews analysis note, edits are made, and it is uploaded to the docdb to begin the review process (must be at least 28 days before defense). This will be around the time we get the review back for the QTracker note.
  • By May 3: QTracker note is in the form of a paper, which is the responsibility of both Arthur and Dustin.
  • By May 10: Arthur has a complete draft of his dissertation, which can be sent to the defense committee for their review. At this time, we can schedule a defense date.
  • By May 17: QTracker paper edits are complete, and paper is submitted. Arthur has given a release presentation at the SeaQuest analysis meeting (must be at least 14 days before defense).
  • May 10 - ??: Arthur and Dustin work on making angular dependence analysis a paper. When complete, it can start the SeaQuest review process. Arthur edits his dissertation as necessary.
  • Week of June 1-5: Arthur defends. This is dependent on the availability of the committee, so the further out we schedule the better. Obviously if someone is not available that week, this may need to be moved.



Does this timeline make sense to you? I've tried to be ambitious but as realistic as possible. This timeline would allow us to finish in a timely manner and get things moving with the necessary reviews on shorter timescales than waiting for things to be in the form of a paper.

Again, we really need to agree on what we are doing before we can agree on timeline.
Let's get there soon.


1-6 St 0 201 elements max

7-12 St 1 384 elements max

13-18 St 2 128 max

19-24 St 3p 112 max

25-30 St 3m 112 max

31-46 Hodoscopes 20 max

47-55 prop tubes 72 max


----------------------------------------------------------------
Requirement of requesting a release of preliminary result
----------------

1. Analysis note
 * Any format (like slide) is acceptable
 * Be submitted to DocDB in time
 * Required contents = all that lets anyone reproduce the result quantitatively
   (These are necessary, but may not be sufficient)
  - Inputs used (names of DB schemas, ROOT files, etc.)
  - Constants used
  - Formulae used
  - Step-by-step quantities obtained
  - Reference is usable and even encouraged
  - The results must be cross checked

2. Release presentation (or meeting)
 * A fully-detailed explanation is not necessary
   (assuming that the analysis note is completed).
 * Questions and answers based on the analysis note

----------------------------------------------------------------
Timeline for requesting result release
----------------

If the result is to be presented on day "X", then on day:
 X-35 day | declaration of release topic & date (to call for everyone's notice)
 X-28 day | analysis note on DocDB
 X-14 day | release presentation
 X    day | release date (conference etc)

While these may seem excessively early, experience has shown us that
there are frequent questions and comments that are raised during this
process.  Having sufficient time to address these concerns is
important, and not having them sufficiently address could cause the
release to not happen.

----------------------------------------------------------------
Tag naming
----------------
There was also a discussion of "preview" vs "preliminary".  The
general consensus was that we should no longer use the word "preview",
with a caveat that "preliminary" implies that the result is close to
publication quality in the stage of the analysis.

================================================================


Updates Needed to the Qtracker Analysis Note:

It's a good start but I think there are some updates we should shoot for to make things easier to read, understand, and evaluate.  Also, there seems to still be quite a bit of missing information so far but perhaps you are still organizing things.

It is going to be easier to read and understand if the details of a particular algorithm are covered with the actual testing and quantifying aspects.  I suggest a high level introduction explaining what we are doing and what the goals are with the philosophy of approach.  Then explain each section of Qtracker putting algorithms function and performance together and provide a detailed quantitative analysis on each section where we anticipate what others will ask about that particular part of Qtracker.  In this way each section of Qtracker is somewhat like a stand-alone analysis note for Event Filter, Track Finder, Track Reconstruction, and Vertex Reconstruction.  Each section should contain the performance and the training details (and interdependence of those two) from training optimization in each case.  The training should have examples of how to produce the exact MC that was used both using Geant (Fun4Sim) and the UVA type of messy event needed for training each part.  Enough information needs to be provided that someone could follow the steps and reproduce exactly what was done. Details about training together should be included as needed, for example training track finding and reconstruction must be done together in order to have the correct errors on the hits.  After this a section about integration and how they all work together.  We then need a detailed performance analysis on the algorithm as a whole including, cut analysis background subtraction using MC with combinatorics(MC or data) with mass distribution improvement at each stage showing the trackers capacity to manage various types of backgrounds, dump separation, minimization of combinatorics and timing, statistics, and mass distribution comparisons to ktracker.


Need to fix and add detailed study about background separation and final signal-to-noise for combinatorics, J/psi purity, and DY purity.

Also need a section here about integration of parts and their interdependence and error propagation depending on the previous part.

More details and performance measures about the iterative vertex and reconstruction and how well that improves things, step-by-step.

Also need a section on the experimental error proportion and an example of the physics extraction.


General Questions and Comments we can discuss and figure out:

Need some configuration settings in QTracker_Run so you can choose what you saving from the models and what cuts you want (chose what to save).

Where is the rest of the E906 data (/project/ptgroup/seaquest/data, /project/UVA-Spin/seaquest/data: before new chambers)

What are the units of the output (GeV)

Was the track finder and reconstruction network trained together as I mentioned?  If so, what was the error that propagated based on track finding efficiency? Similarly for vertex.

After Track Finder was trained, we needed to test on non-dimuon pairs that still have a similar mass range as our mass of interest.

The vertex filter filter only works for dimuons.  We will probably need a track filter that ensures that the single tracks we are looking at are coming from the target before dimuons are checked. This should be determined by rec 4-momentum and track hit pattern.

Studies before and after detector efficiencies were implemented checking effects on tracker and background.  Where does 94% for DC come from? (94% is still a good "average" based on previous and recent 906 studies.  Lower set is 89% and higher set is 97%, so this is fine)  It should change as a function of the station right? (yes, these studies can be done once all else is complete)

We should expect to see a difference in the distributions of Px, Py (2 GeV) from Pz (50 GeV) but we are not seeing that (this is probably because of low statistics).

Why are there these 1E6 in momentum events

Why are there so many high momentum events? Even with no probability selected there are a lot of ridiculously high momentum events which means there are muons there with no real meaning or value to reconstruct.  

How do we read in the MC, this must already be ready to go, it would have been necessary to test with MC along the way of putting this together so there shouldn't be any changes needed (other than file type).

We have equivalents of Dimuon selection Table 4.7 in Kei

What sort of cut testing has been done with Qtracker as a whole

What sort of MC testing has been done with Qtracker as a whole

Why is Hodoscope masking not discussed in the note and there are no controls or cuts in the tracker code?

  • Hodoscope masking is done in KTracker to reduce multiplicity in the drift chambers and improve reconstruction time.
  • KTracker algorithm is O(n!), while QTracker is O(1). We try to do as few hit cuts as reasonably possible with QTracker to avoid removing real dimuons.

How do you know the majority of the combinatoric background is from the dump?  How do you quantify that?



What is the result of the following studies:

Z-momentum as various DC stations (MC and 906)

How does the selected vertex position look compared to all track vertex

What does mass distribution look like before and after each timing and analysis cut and any track selection criteria.  

Check all momentum distributions are making sense statistically with respect to ktracker and whats expected for MC and 906.

How is accuracy and precision changing through the iteration process and training assuming different vertex starting points?

What is the relationship between tracks recovered vs hit interpolation and error in the reconstruction (or track quality)?  What are the limitations and cut offs in this interpolation of the 34 hits?  Can it still work if 4 hits are missing…

Check x-momentum from all tracks vs selected Dimuon tracks: for mu+ all positive, for mu- all negative.  Check for both MC and 906.

How much combinatorics makes it through at each step (using only MC or isolated combinatorics).

Also check RF time, hodo TDC time, changes to in-time cuts over runs (in time flag, done prior for each run).

Check dimuon mass distribution after every step and study signal to background ratio optimization using all track selection and dimuon selection criteria.

Check both the dimuon mass distribution after QTracker probability selection with both MC and 906.  Vary and see what the optimal probability selection is for highest statistics and lowest background.


Target and dump separation and associated background using vertex, classifier, error from regression quality from 4-momentum.  Also can make filters to filter out tracks that are coming from the dump based on hit pattern at the event level.


Questions about Output:

Row 1: Probability that the event has no reconstructable muons.

Row 2: Probability that the event has one reconstructable muon.

Row 3: Probability that the event has two reconstructable muons of the same sign.

Row 4: Probability that the event has two reconstructable muons of opposite signs (dimuon classification).

Row 5: Probability that the event has three reconstructable muons of the same sign.

Row 6: Probability that the event has three reconstructable muons, two of the same sign, one of the opposite sign.

In the output of QTracker why do we have listings of so many probabilities.  Why not handle this all within the QTracker cuts, allowing the user to select what muon combination they want to save in the file. 

Why are we outputting no vertex information?  We need this for dimuons and single tracks coming from the target (given selection).

It would help greatly if we had track error as mentioned in my email.

It would help if we had this info in the output or controlled by the Qtracker_run:

a.) Track quality metric

b.) Hits (DC, hodo, prop) and DC drift time associated with each track

c.) Vertex for single muon, dimuon, assume z-axis, assume target

d.) Hit TDC, RF time, any other helpful timing information for event or from hits used

e.) Cherenkov information, not sure how to use this but I believe it's needed for analyzing rate effects as well as luminosity and total yields.


  • No labels