Created by Matthew Evans
almost 2 years ago
|
||
Question | Answer |
Alternative Hypothesis | The opposite of the null hypothesis, or a potential result that the analyst may expect. |
Benford’s law | The principle that in any large, randomly produced set of natural numbers, there is an expected distribution of the first, or leading, digit with 1 being the most common, 2 the next most, and down successively to the number 9. |
causal modeling | A data approach similar to regression, but used to test for cause-and-effect relationships between multiple variables. |
classification | A data approach that attempts to assign each unit in a population into a few categories potentially to help with predictions. |
clustering | A data approach that attempts to divide individuals (like customers) into groups (or clusters) in a useful or meaningful way. |
co-occurrence grouping | A data approach that attempts to discover associations between individuals based on transactions involving them. |
data reduction | A data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items (e.g., highest cost, highest risk, largest impact, etc.). |
decision boundaries | Technique used to mark the split between one class and another. |
decision support system | An information system that supports decision-making activity within a business by combining data and expertise to solve problems and perform calculations. |
decision tree | Tool used to divide data into smaller groups. |
descriptive analytics | Procedures that summarize existing data to determine what has happened in the past. Some examples include summary statistics (e.g., Count, Min, Max, Average, Median), distributions, and proportions. |
diagnostic analytics | Procedures that explore the current data to determine why something has happened the way it has, typically comparing the data to a benchmark. As an example, these allow users to drill down in the data and see how they compare to a budget, a competitor, or trend. |
digital dashboard | An interactive report showing the most important metrics to help users understand how a company or an organization is performing. Often created using Excel or Tableau. |
dummy variable | A numerical value (0 or 1) to represent categorical data in statistical analysis; values assigned a 1 indicate the presence of something and 0 represents the absence. |
effect size | Used in addition to statistical significance in statistical testing; effect size demonstrates the magnitude of the difference between groups. |
interquartile range (IQR) | A measure of variability. To calculate the IQR, the data are first divided into four parts (quartiles) and the middle two quartiles that surround the median are the IQR. |
link prediction | A data approach that attempts to predict a relationship between two data items. |
null hypothesis | An assumption that the hypothesized relationship does not exist, or that there is no significant difference between two samples or populations. |
overfitting | Amodeling error when the derived model too closely fits a limited set of data points. |
predictive analytics | Procedures used to generate a model that can be used to determine what is likely to happen in the future. Examples include regression analysis, forecasting, classification, and other predictive modeling. |
prescriptive analytics | Procedures that work to identify the best possible options given constraints or changing conditions. These typically include developing more advanced machine learning and artificial intelligence models to recommend a course of action, or optimizing, based on constraints and/or changing conditions. |
profiling | A data approach that attempts to characterize the “typical” behavior of an individual, group, or population by generating summary statistics about the data (including mean, standard deviations, etc.). |
regression | A data approach that attempts to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model. |
similarity matching | A data approach that attempts to identify similar individuals based on data known about them. |
structured data | Data that are organized and reside in a fixed field with a record or a file. Such data are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms. |
summary statistics | Describe the location, spread, shape, and dependence of a set of observations. These commonly include the count, sum, minimum, maximum, mean or average, standard deviation, median, quartiles, correlation covariance, and frequency that describe a specific measurable value. |
supervised approach/method | Approach used to learn more about the basic relationships between independent and dependent variables that are hypothesized to exist. |
support vector machines | A discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin (or biggest pipe). |
test data | A set of data used to assess the degree and strength of a predicted relationship established by the analysis of training data. |
time series analysis | A predictive analytics technique used to predict future values based on past values of the same variable. |
training data | Existing data that have been manually evaluated and assigned a class, which assists in classifying the test data. |
underfitting | A modeling error when the derived model poorly fits a limited set of data points. |
unsupervised approach/method | Approach used for data exploration looking for potential patterns of interest. |
XBRL (eXtensible Business Reporting Language) | A global standard for exchanging financial reporting information that uses XML. |
Want to create your own Flashcards for free with GoConqr? Learn more.