Aggregate Analysis of (AACT) Database

The Clinical Trials Transformation Initiative (CTTI) created the database for the Aggregate Analysis of (AACT) to facilitate research of all, or groups of, studies registered on This publicly available relational database, updated daily with content from, has been used for numerous publications to characterize the clinical trials landscape, report on the state of research in specific disease areas, and investigate compliance with reporting requirements.

The AACT database can be accessed directly via the cloud and static copies can be downloaded. Curated datasets from select projects and publications are also available to allow others interested in similar topics to benefit and build upon the previous work.

Source code for AACT is available via GitHub.

AACT was upgraded in 2017 to its current version. Historical AACT database extracts from 2010-2016 are also available.

Duke Cardiac Catheterization Datasets

The Duke Databank for Cardiovascular Disease (DDCD) is a clinical care database, established in the 1960s by a research team within the Duke Division of Cardiology. As a component of the DDCD, data were collected on patients undergoing cardiac catheterization for suspected coronary artery or valvular heart disease, and on patients undergoing cardiac surgery. These data were used to generate reports for the patients’ medical records, but were also made available for clinical research, for example, research studies to improve the medical care of patients with coronary artery disease. The Duke-owned DDCD is one of the largest and oldest cardiovascular databanks in the world. Duke researchers have conducted several cardiology outcome studies using this source data.


Duke Cardiac Catheterization Research Dataset (DukeCath)

The DukeCath analysis dataset, extracted from the DDCD, includes records and variables for adult patients undergoing cardiac catheterization procedures at Duke between 1985 and 2013. This dataset has been de-identified in order to remove individually identifiable protected health information. The DukeCath dataset is suitable for clinical and/or methodological research and publications purposes. It contains one record per catheterization procedure and includes patient characteristics, catheterization results, interventional treatments, and long-term outcomes data that allow for assessments of relationships of variables with outcomes, as well as examination of time trends. The dataset includes more than 150,000 catheterization procedures in more than 80,000 unique patients.

Users whose proposals are successfully approved by a review committee will use this data within a secured platform, equipped with statistical and visual analytics tools. Thus far, we share this dataset through SAS and AHA platforms. This dataset has been approved for sharing by Duke University Health System and the Institutional Review Board. Allow one week for the disposition of the proposal and a fulfillment timeline.


Duke Cardiac Catheterization Educational Dataset (DukeCathR)

The DukeCath educational dataset, a de-identified and anonymized dataset extracted from the DDCD, has been intentionally modified such that any meaningful or publishable clinical interpretations are impossible. DukeCathR provides instructors and students exposure to the "real world" data for instructional analyses. The goal is to provide instructors with a resource to help teach students how to approach statistical analyses using real patient data. Any teacher, instructor, or student can request this dataset for educational purposes.

This dataset has been approved for sharing by Duke University Health System and the Institutional Review Board. Requests for datasets can take up to one business week to fulfill.

EpiGen Study Seeks Answers About Epilepsy in Genetics

Epilepsy, a common neurological condition marked by seizures, is a complex group of diseases, and many unanswered questions in this field of research remain unanswered. It is difficult to predict which factors cause epilepsy, and equally challenging to discern patterns in development of the condition; for example, although some individuals with brain malformations caused by single-gene defects do end up with epilepsy, not every individual of this population is affected. It is also difficult to determine how individuals will react to treatment; 30 percent of epilepsy patients do not respond to antiepileptic drugs and continue to have seizures.

The DCRI is acting as the data management center for the EpiGen study, which researchers hope will answer some of these questions through the study of genetics. Researchers will conduct a specific type of gene mapping to study gene associations. The study will help them to examine which gene variants may be able to predict which antiepileptic drugs are both safest and most effective for individual patients. The study will also explore which gene variants are associated with refractory epilepsy, which could help identify the physiological causes of refractory epilepsy.

Pharmacokinetics of Clindamycin and Trimethoprim-Sulfamethoxazole in Infants and Children (PBPK) Datasets

Children go through many developmental changes during childhood, and these changes in their physiology need to be accounted for when determining accurate pediatric dosing. If developmental changes are not factored into dosing decisions, drug efficacy and safety could be negatively impacted. A new study seeks to evaluate a platform to validate a type of modeling that could be valuable in identifying the most accurate dosages during pediatric clinical trials.

Population physiologically-based (PBPK) models account for many different types of information, such as drug-specific factors like protein binding, systems-specific factors like blood flow, and other factors like genetic variants. All this information taken together can help investigators determine the dose-exposure relationship of drugs and how that changes as children grow up.

PBPK models could alleviate the burden of pediatric clinical trials by requiring fewer children for each trial while maximizing dose-based safety and efficacy. The current study seeks to evaluate a platform to validate the use of PBPK models in pediatric clinical trials. The study will test clindamycin and Bactrim (also known as TMP-SMX), which are among the most commonly used drugs to treat gram-positive infections in infants and children. Each drug is an ideal candidate for PBPK model evaluation because of their differing physico-chemical properties and elimination pathways.

The raw datasets include records collected from this trial ( Identifier: NCT02475876) on all of the 51 patients enrolled, ranging in age from approximately 3 months – 16 years old.  Data has been de-identified to remove individually identifiable protected health information (PHI), and can be used for clinical and/or methodological research/publication.  The datasets contain information which sites originally entered into the electronic Case Report Form (eCRF) during the study, or raw pharmacokinetic (PK) data entered by the lab running analysis on blood or urine samples.  Depending upon the specific dataset, dataset structure may be one record per subject or multiple records per subject.  Datasets provide information on adverse events, labs, concomitant medications, demographics, drug dosing, patient disposition, inclusion/exclusion criteria, medical history, PK data, and standard of care microbiological evaluations.  Relationships between concomitant medications, demographics, drug dosing, PK levels, and any other data of interest can be assessed by linking records via de-identified subject number.  The datasets include over 350 PK samples.