Glotaran Screencast Transcript – Getting started with Glotaran
Hello and welcome to this Glotaran screencast. Glotaran is an open-source and cross-platform application designed to facilitate simple global and target analysis of time-resolved spectroscopy measurements in particular. In this screencast I will give you a brief demonstration of the workings of Glotaran by analyzing a sample dataset included with the program.
1. Starting the application
The first time you start Glotaran, your window will look something like this. On top of the application window is the menu bar, and directly below that the toolbar. Using the buttons on the toolbar one can open an existing project, or create a new project. The same options are also available from the file menu.
2. Creating a new project
We start this demo by creating a new project. In the New Project Wizzard Window that open, select the Example category and then select the 'Demo Project'. Press next and specify a name, for instance 'Demo'. Press 'Finish' to create the project. The newly created project directly opens in the project tab and a folder structure for it is automatically created.
3. Adding a dataset to the project
Before we can start an analysis we first need to import the data to analyze. We do this be right clicking on the “Datasets” folder and selecting 'Open dataset file'. We can then open any of the supported data files. A description of each of these file types can be found on the Glotaran website, under the documentation link. If the data file is recognized by Glotaran, you can open it by selecting the file and pressing Finish. Note that it is also possible to open multiple files simultaneously by selecting more than one file using CTRL-click. Once you press finish the selected data files will be added as nodes under the Datasets folder.
4. Opening a dataset
In this case, because we started with the demo project we already had a dataset available to us, represented by this demo_data node. Every data file you open is represented by a data file node in the “Datasets” folder. This is just a shortcut to your data file if you will, not the actual file. So when delete a node it is removed from your project without touching the actual data file. Which is what I will do for the two files I added earlier. Now, we can open a data file by double clicking a node, or right clicking and selecting “Display Dataset”. The dataset will be opened in the data editor window in the center of the application.
5. The data editor window
The data editor allows us to visualize and inspect the data. In this example we are looking at a transient absorption spectroscopy measurement. The top left shows a 2D intensity map of the dataset. Here wavelength is displayed on the horizontal axis and time is displayed on the vertical axis. The amplitude of the datapoints is represented by the color scaling. Controlling the sliders to either side of the intensity map allows us to looks at individual traces and spectra in the dataset. Zooming in is possible by clicking in the chart and dragging the mouse from top left to bottom right. Zooming out is possible by clicking and making the reverse motion. Zooming in on a particular region of the data allows select a subset of the data for analysis, for instance one could trim the noise edges before analysis. Alternatively one can press the select button and specify very precisely a specific window. Other basic pre-processing methods are: averaging, resampling, baseline subtraction and outlier correction.
6. Pre-processing of data: baseline subtraction
Baseline subtraction is probably the most used and most useful method of data pre-processing. It allows one for instance, to subtract an average spectrum from the data. The default option subtracts the average of a number of spectra at the beginning of the dataset, which can correct for any unrealistic contributions to the data before the moment of excitation. Of course, any form of pre-processing should always be used with caution so as not to induce any artifacts.
7. Data inspection: singular value decomposition (SVD)
The most basic form of analysis, is available directly from the data editor window. When we select the SVD tab, we are presented with the Singular value decomposition of the data matrix. The SVD is a mathematical operation which separates the contribution to the data in linearly independent vectors scaled by a number. Here the left singular vectors (LSV) represents time traces and the right singular vector (RSV) represent spectra. The contribution to the data is the product of the nth left singular vector and right singular vector scaled by the nth singular value. The singular vectors are ordered based on their contribution to the data as represented by the magnitude of the singular values as shown in a screeplot.
8. The use of SVD analysis
SVD analysis allows one to make a rough estimate of the number of rate constant or lifetimes that would be necessary to describe these data in Global Analysis. By increasing the number of displayed singular vectors we can see that in this case there seem to be at least one, two, three, …, three linearly independent vectors significantly different from the noise as observed from the presence of significant structure in both singular vectors. The 4th singular vector is clearly lacking structure in the left singular vector. Therefor three rate constant would be a good number to start global analysis with. Note that other than looking at the singular vectors individually, it is possible to use a limited number of singular values in SVD filtering; again a data manipulation technique to be used with much caution.
9. Creating a new dataset
Going back to the Data tab. In order to create a dataset for analysis we first have to select a region we are satisfied with, like so, and then press 'Create Dataset'. We specify a name for the dataset and press 'ok'. Now dataset node has appeared under the data-file node in the project management. This is the type of dataset we can later use for data-analysis.
10. Creating a new analysis scheme
In order to start analysis of the dataset we first need to create a new analysis scheme. We do this by right clicking the Analysis Schema folder in our project and selecting 'New → New Analysis Scheme' or if you do it for the first time 'New → Other →New Analysis Scheme'. We specify a file name and press 'Finish' and the scheme file is created as a node under the analysis schema folder and opens automatically in the editor window. This now is our analysis canvas. Now, there are three essential ingredients to be added to this canvas. First there is a so called 'Dataset Container' which holds the datasets used in the analysis. Second there is the ‘model’ container which represents the model used for global or target analysis; this can be created directly from the palette or by dragging an existing model file onto the canvas. Note that any container can again be deleted by right clicking on the label and selecting "Delete node". Third there is the so called output container which is used to specify where the results should be outputted and how many iterations the optimization algorithm should run for.
11. Setting up the analysis scheme
Now before we can do anything else. All of these containers need to be linked to each other by connection arrows. One can do this by holding the CTRL key, clicking the label of the container, and dragging a connection from one container to another. The model container needs to be connected to a dataset container which in turn needs to be connected to an output container. Only then can the analysis be run. If it sounds a bit tricky, that’s because it is. A future version of Glotaran will simplify this process.
12. Starting a new analysis
We now have all the ingredients to start an actual analysis. First, we drag the dataset we created earlier into the dataset container. Second, we need to specify the model. The model is currently empty but we can drag and drop nodes from the 'Modelling' category on the Palette to specify our model. Available are: Kinetic Parameters, Instrument Response Function (IRF) parameters, Dispersion parameters, Weight Parameters and Coherent Artifact parameters. The KMatrix node is for advanced users and used when doing target analysis. For global analyis only the kinetic parameter node is needed. The modelling node can be dragged and dropped into the model container. Once we have a node, its properties can be changed by means of the properties window. An explanation for all of these properties can be found in the online documentation. Now, before we proceed we will take a small intermezzo to take another close look at our dataset in order to postulate a model that can describe the data.
14. Intermezzo: re-examining the dataset
The goal of this intermezzo is to take another look at the dataset and look for clues as to what the underlying model could be. One of the first things we notice from the 2d intensity map is dispersion: the moment of excitation or time-zero is dependent on the wavelength. The dispersion can be parametrized in our model, we will get back to this later. First we look at the most important aspect of our analyis. A correct description of the Instrument Response Function or IRF. The IRF represent the response of the sample to the initial excitation pulse. Typically this is modeled by a Gaussian function with a specific position and width. We can come up with good starting values for these parameters by looking at the data. If we zoom in we see that the moment of excitation is around 1ps and there is a rise of the IRF of about 0.2ps. We will remember this values for later.
Intermezzo: dispersion parameters
For the dispersion it is nessecary to decide on a central wavelength - here we could chooise for instance 530 nm - from which points to describe the change in the position of the IRF. Typically this can be modelled with a second or third order polynomial so one needs to estimate 2 to 3 reasonable starting values that describe the change in the position of the IRF as a function of wavelength with respect to the central wavelength that we have chosen. I will leave this as an exercise to the viewer but if you get stuck simply take a look at the analysis scheme that comes with this demo project.
Intermezzo: singular value decomposition
For the final part of this intermezzo we revisit the SVD of the data, because the next step is to estimate the number of components we want to attempt to resolve. As we saw earlier in this screencast SVD is a tool that can help you with this. We found 3 singular vectors with a contribution to the data significantly different from noise. For every components we need to specify a starting value. Usually the best way to do this is to look at the SVD and estimate the order of timescale at which the spectral properties change. Here we can see that for this dataset these processes occur within 1 to 100 ps. So resonable starting values for the kinetic rate constants would be between 1 and 0.01.
Kinetic parameters (Kinpar)
Going back to the analysis schema we can now start building the model with the parameters we just investigated. We add 'Kinetic Parameters' from the palette, change the number of components to 3, and expand the 'KinPar' node. Now we specify some resonable starting values between the rates that we estimated by looking at the SVD of the data. In this case between 1 and 0.01 [...] In addition to the number of parameters, the kinpar node also allows us to choose if we want to use a sequential model or a parallel model for global analysis. The parallel model is the default but in this case we will postulate a sequential model. Finally the kinpar node allows us to restric the rates that are estimated to be strictly positive which we will enable.
IRF Parameters (IRFpar)
For the IRF parameters we will use the default Gaussian model type and put in the parameters we estimated earlier. So 1ps for the position and 0.2 ps for the width.
Dispersion parameters (Dispmu, Disptau)
Finally we specify the dispersion. First we change the central wavelength to the 530 nm we decided on earlier as it will affect our estimates for the parameters. We will use a third order polynomial to describe the dispersion, which will require only 3 parameters, as the offset is already given by the position of the IRF. Again we specify some reasonable estimates.
16. Finishing and saving the model
Now that the model specification is complete we save the model by pressing the save all button from the toolbar. Now we still need to decide what name to give the folder where the results are saved by giving a name in the output path or pressing the dialog button, for instance "test". Next we need to decide what number of iterations to use and whether we want to calculate the error for the esimated spectra - a generally time consuming process which we will only do when we have a good enough model. If we leave the number of iterations at 0 for the moment we can test if our model actually works or if we made a mistake in the model specification in which the analysis will crash.
17. Running the analysis
To run the analysis now you can right click the analysis node and select "Run analysis". Now assuming you have Rserve running in the background the analysis will start as indicated by the progress indicator below. … after a few seconds to minutes, depending on the size of the data and the complexity of the model a new folder with results will appear in the Results folder. In this folder you will see these files indicating that the analyis was succesfull.
18. Inspecting the results - initial look: overview tab
Because this was only a trial run and we used only 0 iterations there is not much point in looking into the results in detail. But what we can do is take a quick look at the main result file to see how good our initial guess of parameters was. We open the file and are presented with the overview tab. Here we see the estimated Evolution Associated Spectra or EAS, the Decay Associated Spectra or DAS and their respective normalized versions. To the right we see the concentration profiles corresponding to the rate constant that we put in the model. As you can see, because we used 0 iterations, they are still the same. At the bottom we see the SVD of the residual matrix with the first and second singular vectors plotted in black and red respectively. Clearly there is still quite a bit of residual structure left indicating that the model currently doesn't describe the data yet, but since we haven't iterated, and the parameters are still what we initially specified, this is not so surprising.
19. Inspecting the results - initial look: traces tab
Before we head back to the analysis scheme to let the program iterate a bit we first take a quick look at the traces tab. The traces tab present us with a detailed view of the quality of the fit. For every time trace and spectra we can inspect the quality of the fit, simply by dragging the sliders. As you can see, it's not even that bad, apparantly our starting values were not too shabby. Overlaid on the 2D intensity map is the dispersion curve, in black which is currently far from optimal but again not too shabby. Finally we see again we SVD of the residuals matrix. In this case, we can even zoom in and perform an SVD on part of the residual matrix for a better look. Confident that we are on the right track, let's now go back to the analysis scheme and run for a few more iterations, say 5, and rerun the analysis.
20. Inspecting the results - closer look: summary file
After the analyis has completed sucessfully you can see that the old results are not overwritten but instead a new folder is created. This allows for comparison between different analysis run. Now that we have iterated a bit the first file we will look at is the summary file. It provides us with a textual summary of the analysis process. Given is: the number of iterations the analysis was run for, the calls that were made to R in order to perform that analysis, the final residual standard error and values of the fitted parameters with their standard errors. If the analysis failed, because the model is wrong or our initial guess for the parameters is too far off then this file will contain a diagnostic error message rather than the parameters.
21. Inspecting the results - closer look: overview file
The next file that we should look at is the overview file. It contains roughly the same information at the plain text summary file but it includes a graphical view of the progress of the residual sum of squared errors. The most important thing that this graphs tells us is that the fit has converged as there was little to no progress is the last iterations from 4 to 5. It in an important criteria because before convergence we cannot really judge the quality of the model and it will be hard to tell if the model needs another parameter or wether the program just needs to iterate a bit more, so always check convergence of the fit before interpreting the results.
22. Inspecting the results - closer look: overview tab
Satisfied that our fit has converged it's time to take a closer look at the main result file with the estimated parameters and spectra. In principle every analyzed dataset will have a results file so if you analyze multiple datasets simulatenously you will see multiple files appear. In this case we have only one. The first tab shows us again: (1) the estimated parameters, cosisting of the three rates constants, the two IRF parameters, position and width, the reciprocal of the rates or estimated lifetimes and the total RMS. (2) estimated spectra (3) the corresponding concentration profiles and (4) the first to singular vectors of the SVD of the residual matrix
23. Inspecting the results - closer look: enable lin-log
One of the first things to notice is the reduced structure in the singular vectors, however it is hard to see because most of the data points are located within the first 100 ps, wheras the time axis runs until 900 ps. One of the things we can do is to simply zoom in, or even to open the graph in a new window, but stil the data in the early times can easily be obscured. This is actually a common problem in time-resolved spectroscopy, where one often measures many points in the beginning to capture the ultrafast dynamics and just a few later on. The problem of visualizing this kind of measurements can be solved by using a linear-logarithmic or lin-log time axis. This can be enabled by pressing the LinLog button from the toolbar. The number controls which part of the chart is linear, in this case the axis is linear from -1 to 1 ps and logarithmic thereafter. The axis is centered around time-zero which is defined as the maximum of the IRF. The lin-log setting also applies to the traces tab. Here you can really see the difference in the time traces.
24. Inspecting the results - closer look: interpretation
With the lin log axis enabeled we can more clearly see the remaining residual. Clearly there is some residual on the very short time scale which seems spectrally looks a lot like the spectrum at time zero. This could probably be captured by taking into account a coherent artefact which follows the time profile of the IRF. This is left as an exercise to the viewer.
25. Updating existing model with estimated parameters
One last tip before concluding this screencast. It is possible to update an existing model, with the parameters estimated from the analysis. This can be done by dragging the main result file on top of a model while holding the CTRL key. Like so. The drop icon will change to include a plus indicating that the drag and drop operation is indeed possible. You will then be prompted which parameters of the model you would like to update with the estimated values from the analysis file. This little trick should make testing many different models less painless and allow you to more quickly zoom in on the right model.
Of course when you are finally happy with your model and the quality of fit, then you still need to make sense of the analysis and try to give a physical interpretation to the results, and that is usually where the real challenge lies ... With that final remark I will conclude this screencast. I hope you enjoyed it and are now able to use Glotaran for your own data analysis. If you find any bugs, or you find you are missing certain features, please feel free to contact us and we see what we can do ...