**Association Study**: Study relationships between outcomes and other variables.

**AYP**: Adequate Yearly Progress

**Categorical Data**: Variables that can be categorized, such as gender, college major, or even language proficiency.

**Causation**: A causal relation between two variables means that a change in one variable directly causes a change in the other variable.

**Cohort**: persons who share something in common; in education this is most often a group of students who entered a particular year.

**Correlation**: a measure of the strength of a linear relationship.

**Cut score**: score that separates test takers into categories, such as a passing score and a failing score. For instance, it can be the minimum score on a standardized exam required for entry into an institution.

**Data**: is information that is gathered from observations. In statistics, data are the numbers that result from the information that was gathered.

**Dataset**: a collection of data. It is presented as a table where the columns represent variables and each row represents a case.

**Descriptive statistics**: summarizes and describes variables and the relationships among them as oppose to inferential statistics that takes a sample from the population and determines whether relationships found can be generalized to the entire population.

**Drop-out rate**: the rate at which a student drops-out of a course in a given term.

**Effect**: a measured relationship, difference, or statistic in a statistical study. For example, this could be a correlation, a mean or median, or a difference between two means.

**ERC**: Education Research Centers.

**Experimental Study**: A numerical study where the researcher manipulates a variable(s) to study the causative effect on the research subjects.

**Failure rate**: the rate at which students fail a course within a given term.

**Family Educational Rights and Privacy Act (FERPA)**: guidelines regarding the releasing of students’ personal information.

**First-Time-In-College (FTIC)**: a first time entering student.

**Frequency**: a tally or count of observations in each category of a variable.

**Graduation rate**: the rate at which a student graduates within a set number of years when beginning their freshman year in the fall term. This is usually calculated for 4-year, 5-year, and 6-year terms.

**Index**: a specially designed statistic that is typically based on several variables. An index is usually either a weighted sum or mean of several variables. Indexes are often used as a convenient way to summarize a large set of data..

**Longitudinal Study**: an association study that includes multiple observations of subject characteristics over time.

**Mean**: the average of values for the cases in a variable. The mean is more accurate than the median. However, depending on how the variable is operationalize you may have to rely on median.

**Median**: score separating the top 50% of the scores from the bottom 50% of the scores.

**N**: Sample size. The larger the sample size the more accurate your results.

**Operationalization**: the process of defining concepts. In statistics this consists of defining variables into measurable factors.

**Pass rate**: the rate at which students pass a course within a given term.

**Percentage**: a part of a hundred. It is calculated as the proportion times 100.

**Percentile**: a score at or below which a given percentage of the scores lies.

**Persistence rate**: the rate measuring how many students return from the fall semester to the spring semester. This can include freshman, sophomore, junior and senior students.

**PEIMS**: The Public Education Information Management System.

**Population**: all the cases that we are interested in.

**Proportion**: the outcome of a fraction representing number in each category divided by the total number of cases.

**Practical Significance**: tells us whether a difference has some value in the real world.

**Random sampling**: A sampling method where everybody in the population has an equal chance of being selected for the study.

**Rates**: the number of subjects per 1000 or per 10,000 or per 100,000, depending on what is most relevant. Percentage is equivalent to rate per 100. Rates are used in place of percentages when probabilities are very small and/or for convenience of interpretation.

**Raw data**: data that is not transformed or changed in any way.

**Reliability**: a quality of data that it will result in similar outcomes each time the measure is taken on the same subject and under the same circumstances. For example, reliable tests are tests that give about the same score each time taken if there is the same underlying content knowledge for the subject taking the test.

**Retention rate**: the rate measuring how many freshman continue into their sophomore year.

**Sample**: A subset of cases from a population.

**STAAR**: State of Texas Assessments of Academic Readiness (this will be replacing the TAKS test)

**Statistical Significance**: A result that is deemed unlikely to have occurred by chance. That is, the effect is not typical when we consider the natural variability in the data.

**Statistics**: numerical measures based on data.

**Success rate**: the rate at which a student successfully completes a course.

**TAKS**: Texas Assessment of Knowledge and Skills.

**TEA**: Texas Educational Agency.

**THECB**: Texas Higher Education Coordinating Board.

**TPEIR**: Texas PK-16 Public Education Information Resource.

**Validity**: a quality of data that ensures it measures what it is intended to measure. For example, an IQ test should really be measuring the intelligence and not other characteristics.

**Variables**: a characteristic in which cases in a data set differ. Variables “vary” or take on different values.