Raw data is constantly being collected by businesses in all fields from websites visited, purchases made, videos watched, etc. Data remains as raw text until it is mined and the information contained within it is harnessed. Mining data to make sense out of it has applications in varied fields of industry and academia.
Computers enable us to process data to turn it into information for decision making and research. Computers often identify patterns in data that individuals would not be able to detect.
Cleaning
One way our computer are very helpful is with cleaning data, removing corrupt data, removing or repairing incomplete data, and verifying ranges or dates among other steps. Removing or flagging invalid data is very useful. Computers are much more efficient at cleaning data than humans.
Slide 4
How Computers Help Process Data
Filtering
Computers are able to easily filter data. Different subsets can be identified and extracted to help people make the data meaningful.
For examples, all temperature values greater than 98.6 could be meaningful and need further processing or perhaps just a count of how many there are in the entire data set.
Slide 5
How Computers Help Process Data
Classifying
Computers can help make meaning of large data sets by grouping, or clustering, data with common values and features. These classifications would be based on predetermined criteria as it relates to the use of the data. There could be single or multiple criteria used for these groups depending on the reason the data was collected.
Slide 6
How Computers Help Process Data
Patterns
Computers are able to identify patterns in data that people are either unable to recognize or cannot process enough data to see the pattern. This process is known as data mining. New discoveries and understandings are often made this way. When new or unexpected patterns emerge, the data has been transformed into information for people to begin to interpret. Computers make processing huge amounts of data possible so people can make sense of it.
A company purchases a large chunk of data from a social media site. If they want to analyze the data to learn more about potential customers, what technique should they use?
A. Modeling to test different hypotheses about what data could be present.
B. Data mining to identify patterns and relationships in the data for further analysis.
C. Maximization to get the highest return on their purchase of the data.
D. Data processing to use the data with existing company software to see if it will run on their systems, or if new ones will need to be developed.
Slide 9
Answer & Explanation
Data mining is the analysis of data to identify patterns and connections. Companies can then use the data to identify business opportunities to take advantage of or threats to avoid.
Slide 10
Review
Which of the following techniques would be best to use to further analyze patterns that emerged during data mining?
A. Classifying data to categorize it into distinct groups.
B. Cleaning data to determine which data to include in the processing.
C. Clustering data to separate data with similarities into subclasses.
D. Filtering to set conditions so only records meeting their criteria are included.
Slide 11
Answer & Explanation
All of the techniques can be used with further analysis.
Slide 12
Review
Why is cleaning data important?
A. It ensures incomplete data does not hide or skew results.
B. It removes bad or incomplete data.
C. It repairs bad or incomplete data.
Slide 13
Answers & Explanation
Data needs to be cleaned to remove or repair corrupt or incomplete data to ensure valid data is used for research and analysis.