**Hardware**

In our lab we have recently switched to using a digitial I/O that connects via USB to the stimulus computer and have created two parallel port cords that can independently send separate triggers to different devices. There are several types of these digital I/O devices such as labjack and measurement computing. I have used devices from both companies, but have had an overall better experience with labjack so far. We have purchased the U3-LV ($108) with the extra CB-25 terminal board attachment ($29). We have attached 8 pins (FIO0-FIO7) to cat5 cables and grounds to a 25 pin parallel port connector and pins (EIO0-EIO7) and grounds to a separate parallel port connector. This allows us to send two separate 8-bit binary signals to different devices. Each of these parallel port connectors could also be split if some devices will be receiving the same signal. Additional details about the hardware setup can be found here.

**Software**

The next question is how to get your favorite stimulus delivery software to interface with the labjack. Our lab uses a variety of software (e.g., eprime 2.0, psychtoolbox, psychopy, and presentation) running on both windows and macintosh computers so we have implemented several different options. We typically need to send separate triggers to our medoc pathways pain delivery system and our biopac MP150 recording system. We have found that that labjack performs very well on both our windows and mac computers with a latency of approximately 8ms. We typically need to simultaneously trigger a specific pain intensity to the medoc pathways system and a trigger to the biopac psychophysiology system.

First you need to install drivers for the labjack. We are using the drivers provided by labjack for eprime in windows and the labjack exodriver for libusb on the mac installed via homebrew (e.g., *brew install libusb exodriver –universal*).

For eprime, we have created a function that is placed inside the user script. You then need to insert an initialization command in an inline script (e.g.,InitLabJack) and then call the labjack trigger function (e.g., TriggerLJ(Trigger1, Trigger2 (optional), “rev_channel” (optional))). This software is available in our github repository. If you are triggering a medoc pathways system you need to make sure that you reverse the binary code. We have an option to do this in your call.

'initialize InitLabJack 'trigger 46 degree temp and set all eprime triggers to 1 TriggerLJ(46, "rev_channel", 255)

For psychtoolbox, we are using an objected-oriented matlab function created by iandol available from his opticka github repository. This labJack class has a lot of functionality, but so far we have only been using the SetDIOValue method. The author has documentation about the available methods here.

% initialize lj = labJack('verbose',false) % send binary signal of 46 to medoc and binary signal of all 1's to biopac lj.setDIOValue([46,255,0],[255,255,255]) % reset all triggers to 0 lj.setDIOValue([0,0,0],[255,255,255])

There is also a nice toolbox called labjackpython that works well for making calls from PsychoPY.

For an alternative approach see Chris Rorden’s StimSync and advice on using the Measurement Computing 1208 data acquisition device.

Please let us know if you have any additional questions, run into any problems, or find additional resources to share.

]]>

**Model Fitting Overview**

The general idea of model fitting is to posit a theory about how something might work and test it by examining how well the theory can account for a dataset given a set of free parameters. In principle, I try to evaluate how well the model performs (i.e., the goodness of fit) relative to other competing theories. This comparison assumes that the model with the fewest number of free parameters that has the best goodness of fit for the most subjects is the best model to explain the data. To calculate the goodness of fit, it is often necessary to determine the optimal free parameters specified in the model, a process referred to as *parameter estimation.* The most common type of estimation techniques are *Least Squares Estimation*, which attempts to minimize the squared difference between the model prediction and the observed data, and *Maximum Likelihood Estimation, *which calculates the likelihood of a model explaining a dataset given a set of parameters. I recommend reading Myung (2003) for a tutorial on maximum likelihood. I typically use a two-stage multi-level framework to estimate parameters for a group. This is identical to neuroimaging analyses in which parameters are estimated for each individual subject and then averaged across participants. For hypothesis testing, I often use non-parametric statistical tests (e.g., signed rank test) to perform model comparison on model fits penalized for their number of free parameters (e.g., AIC, BIC ). I also occasionally use F and chi square tests to perform nested model comparisons and the vuong statistic for non-nested model comparisons. More recently, I have been playing with the stan statistical language which has both R and Python wrappers to perform hierarchical bayesian parameter estimation. For a more in depth overview of this process, I highly recommend an excellent chapter by Nathaniel Daw.

**Computational Model Class ( comp_model() )**

The computational model ( comp_model() ) class creates an object instance that is used to fit a computational model to a multi-subject dataset. The object uses the design_matrix() class for the data set, which has rudimentary functionality similar to an R data.frame(). There are additional fields for the model and parameters for the model fitting procedure such as the parameter constraints, number of iterations, and type of estimation (e.g., maximum likelihood or least squares). I use Matlab’s fmincon optimization function to find the parameters that result in the best fit of the model. This constrained optimization function is very efficient, but has a tendency to occasionally get stuck in a local minima. Using more iterations will rerun the model multiple times with different initial starting values which decreases the likelihood of fmincon getting stuck in local minimas, but increases the amount of time it takes to fit the models.

**Current Methods for comp_model** (inherits from design_matrix class too)

*avg_aic*– display average AIC value*avg_bic*– display average BIC value*avg_params*– display average parameter estimates*comp_model*– class constructor*fit_model*– estimate parameters using model*get_aic*– extract all subject’s AIC values*get_bic*– extract all subject’s BIC values*get_params*– extract all subject’s estimated parameters*plot*– plot average model predictions across subjects*summary*– display summary table for model*save*– save object as .mat file*write_tables*– write out parameter estimates and trial-to-trial predictions to csv data frame.

**Example Usage**

First we will load some example data using the importdata function and set some of the defaults settings for the optimization algorithm. The comp_model() function assumes that data is in the long format and that the subject indicator is the first column of the data frame.

% Load data basedir = '~/Dropbox/TreatmentExpectations'; dat = importdata(fullfile(basedir, 'Data','Seattle_Sona','seattle_ses_by_session.csv')); % Set optimization parameters for fmincon (OPTIONAL) options = optimset(@fmincon); options = optimset(options, 'TolX', 0.00001, 'TolFun', 0.00001, 'MaxFunEvals', 900000000, 'LargeScale','off');

Next, we will create comp_model() class instance for an example linear model. This requires that we specify a data frame, cell array of column names, and a model name. The Model Name must refer to function containing the model, which is located on the matlab path (See example ‘linear_model’ function below). We can also specify some additional parameters that are required for the model fitting procedure.

*nStart*– the number of iterations to repeat model estimation – selects the iteration associated with the best model fit. Higher numbers decrease the likelihood of fmincon getting stuck in a local minima, but increase the computational time.*param_min*– vector of lower bound of parameters*param_max*– vector of upper bound of parameters*esttype*– type of parameter estimation (‘SSE’ – minimize sum of squared error; ‘LLE’ – maximize log likelihood; ‘LE’ – maximize likelihood)

lin = comp_model(dat.data,dat.textdata,'linear_model','nStart',10, 'param_min',[-5, -20], 'param_max', [60, 20], 'esttype','SSE'); 911x8 comp_model array with properties: model: 'linear_model' param_min: [-5 -20] param_max: [60 20] nStart: 10 esttype: 'SSE' params: [] trial: [] dat: [911x8 double] varname: {'subj' 'group' 'sess' 'se_count' 'se_sum_intensity' 'any_action_taken' 'hamtot' 'bditot'} fname: ''

Once the comp_model() object has been created with all of the necessary setup parameters, the model can be fit to the data with following command.

lin = lin.fit_model(); 911x8 comp_model array with properties: model: 'linear_model' param_min: [-5 -20] param_max: [60 20] nStart: 10 esttype: 'SSE' params: [77x6 double] trial: [911x4 double] dat: [911x8 double] varname: {'subj' 'group' 'sess' 'se_count' 'se_sum_intensity' 'any_action_taken' 'hamtot' 'bditot'} fname: ''

This adds data to two new fields, params and trial.

*params*– is the parameters estimated for each subject. Rows are individual subjects. Columns are {‘Subject’, ‘Estimated Parameters (specific to each model)’, ‘Model Fit’, ‘AIC’, ‘BIC’}*trial*– is the trial by trial data and predicted values for all subjects stacked together.

The overall average results from the model can be quickly viewed using the summary() method.

summary(lin) Summary of Model: linear_model ----------------------------------------- Average Parameters: 18.1073 Average AIC: 35.8264 Average BIC: 36.3802 Average SSE: -1.86 Number of Subjects: 77 -----------------------------------------

The ‘params’ and ‘trial’ data frames can be written to separate .csv files. This can be helpful to open these data in a separate program such as R or Python for plotting or futher analyses.

lin.write_tables(fullfile(basedir,'Analysis','Modeling'))

The overall object instance can be saved as .mat file. This is helpful as sometimes model estimation can take a long time especially if using a high number of iterations.

lin.save(fullfile(basedir,'Analysis','Modeling','Linear_ModelFit.mat'))

The average predicted values from the model can be quickly plotted using the plot() function. It is necessary to specifiy the particular columns from obj.trial to plot as these will be specific to the data set and model. This method has some rudimentary options to customize the plot.

plot(lin, [3,4], 'title', 'Linear Model', 'xlabel','session', 'ylabel', 'Average BDI', 'legend', {'Predicted','Observed'})

**Example Model Function**

Here is an example function of a very simple linear model. Functions can be completely flexible, but need to have the free parameter (xpar) and data as inputs and output the model fit (e.g., sse). This is so fmincon can optimize the parameters for this function by minimizing the Sum of Squared Error (sse – for this example). This function needs to be in a separate function file on the matlab path.

function sse = linear_model(xpar, data) % Fit Linear Decay Treatment Model global trialout %this allows trial to be saved to comp_model() object % Model Parameters beta0 = xpar(1); % Intercept beta1 = xpar(2); % Slope % Parse Data obssx = data(:,8); %BDI symptom scores %Model Initial Values sse = 0; % initial value sum of squared error time = 1:size(data,1); time = time - mean(time); %center time variable % This model is looping through every trial. Obviously this isn't % necessary for this specific model, but it is for more dynamic models that % change with respect to time such as RL models. for t = 1:length(obssx) %Calculate symptom decay using linear decay predsx(t) = beta0 + beta1 * time(t); %linear trend of symptom change % update sum of squared error (sse) sse = sse + (predsx(t) - obssx(t))^2; end %Output trial by trial results - saved as obj.trial trialout = [ones(t,1)*data(1,1) (1:t)', obssx, predsx(1:t)']; end % model

** Model Comparison**

For a quick example of testing a hypothesis by comparing competing models we will use Matlab’s robust non-parametric signrank test to determine if the test model (i.e. linear_expect_model) fits the data significantly better than the null model (i.e. linear_model) using the get_aic() command. We use the AIC metric to compare the models as the test model has 3 free parameters compared to the null model which only has 2.

[P,H,STATS] = signrank(get_aic(lin),get_aic(lin_expect)) P = 8.8387e-08 H = 1 STATS = zval: 5.3491 signedrank: 2555

In this example, we see that the test Linear Expectation model fits the data significantly better than the null Linear model, supporting our hypothesis.

Let me know if you have any comments or suggestions on how to improve the software.

]]>

**Scientists Needs:**

- Scientists work in teams, but are mostly interested in their individual careers/projects. This is why dropbox and google apps work well. Each scientist has their own account and can work with other people, but in the end they are in control of their data and interface. Collaborations occur within and between laboratories and personnel are in a constant state of flux (i.e., primarily undergraduate, graduate, postdoc trainees). Many cloud based project management softwares are designed to be run in a stable business setting and do not work well with the ever changing laboratory landscape.
- Scientists need an interface for planning, tracking progress and discussing. These needs are obviously shared with many other disciplines, but scientists are generally accustomed to technology such as laboratory notebooks, which track progress of data collection, analyses, results, brainstorms, and literature searches.
- Scientists are not as well financed as their industry counterparts and have limited pockets of money from startup and grants to pay for software. Scientists are accustomed to academic pricing models and freeware, and are not used to the software as a service (SAAS) model, particularly when revenue sources from grants are not constant.
- Scientists are busy and won’t use things that are too complicated, take a lot of effort, or are unintuitive. This is one of the reasons why I think Apple products have such deep penetration into the academic community. Many scientists have little patience for the instability and insecurity of windows, or the complexity of linux operating systems.
- Scientists are fairly naive when it comes to management. We are trained through an apprenticeship model, and are generally unaccustomed to managing a lot of personnel and projects. In addition, we have limited resources and do not typically hire management other than lab managers. Thus, we have less training and resources to manage teams compared to other industries.
- Scientists like stability. Nobody wants to spend their precious time or money in learning technology that becomes quickly obsolete or discontinued (anyone remember google wave? or apple xserves?).

**Examples:**

Fortunately, there are a number of really nice software packages being developed that will likely be helpful in improving efficient management of projects and collaborations

Trello: A free online collaborative tool started by Joel Spolsky (one of the founders of stackoverflow) and based on the kanban philosophy. The general idea is that there are organizations with multiple people, boards for specific projects, lists within a board, and cards within lists that can be discussed and tasked to specific individuals. Each board can have different permissions and every user has unlimited boards. Kanban encourages cards to move across different stages of production represented as lists. This software is very simple and visually intuitive and I think works well for many different tasks such as brainstorming and collaborating on specific projects. This software is well suited to be adopted by scientists because it is free and based on interacting individuals rather than a central system for a particular organization. I think it nicely complements other cloud based apps such as google calendar, dropbox, and hackpad.

Pros:

- Free!
- Web interface, tablet, and mobile apps are intuitive, fast, and well implemented.
- Unlimited users and boards
- Works well for individual research projects within a lab and across labs. Relatively easily to invite outside collaborators to specific boards.
- Works well as an individual project manager board.
- Email and dropbox integration.
- Has an open API, which adds to it’s flexible nature.

Cons:

- No real dashboard feature which allows you to see upcoming tasks and updates across boards.
- Discussions are limited to specific cards. There is no general discussion board.
- Calendar app is only for projects associated with a specific board. No integrated shared calendar (can link calendar feeds to google calendar though).
- No collaborative writing option. (but try hackpad!)

Basecamp: One of the oldest and most popular cloud based collaborative projective management softwares and resulted in the development of Ruby on Rails. This software has projects, which can have tasks, discussions, files, and text documents. Overall, I think this is a great software for many of the important aspects of lab based collaborations. It does an excellent job of facilitating discussions and centralizing scheduling. However, it might be difficult to get all members of a large lab to participate as it is expensive, has a limited number of projects, and stays with the organization not the user.

Pros:

- Very intuitive and clean interface
- Very nice permissions. People can be clustered into groups and everything can different permissions depending on who should be involved.
- Can discuss anything (e.g., tasks, files, etc.). Discussions do not have to be tied to a specific task.
- Nice calendar integration. Can have separate calendars for specific projects, lab events, personnel availability, resource availability, etc.
- Nice mobile apps.
- Nice individual user dashboard to see upcoming tasks and updates on all projects
- Unlimited users.
- Really nice email integration and there are lots of third party integrations.
- Can pay for individual boards at $15 a month per board if you leave the lab.

Cons:

- Expensive. ($50 a month for 40 projects, $3k a year for unlimited projects)
- Limited projects. This was a very limiting factor in our lab where we needed at least 100 boards just for our ongoing research projects.
- Not very convenient to take with you if you leave a lab that is highly integrated with basecamp.
- Poor collaborative writing implementation.

**Additional Software Packages:**

- Asana
- ActiveCollab
- Producteev
- Flow
- Redbooth
- Huddle
- Proofhub
- Smart sheet
- Wrike
- crowdbase
- Binfire
- Breeze
- BCSocial
- Freedcamp
- Podio
- Wridea
- Gantter
- OpenProject
- LibrePlan
- ProjectLibra

**Thoughts?**

Anyone else had positive or negative experiences with a particular software in the context of a scientific collaboration?

]]>