Big Data Analytics

Descrição

Mapa Mental sobre Big Data Analytics, criado por chandrikasweety9 em 02-01-2014.
chandrikasweety9
Mapa Mental por chandrikasweety9, atualizado more than 1 year ago
chandrikasweety9
Criado por chandrikasweety9 quase 11 anos atrás
22
0

Resumo de Recurso

Big Data Analytics

Anotações:

  • Examining large amounts of variety types of data to uncover hidden  patterns and unknown correlations and useful information.
  1. Big data

    Anotações:

    • General term used to describe the unstructured and semi-structured data.  Data - specify the term is petabyte and exabyte.
    • Petabyte is a measure of memory or storage capacity & is 2 to the 50th power bytes in decimal approximately a thousand terabytes.
    • Exabyte(EB) is a large unit of computer data storage , 2 to the sixtieth power bytes. Approximately one quintillion bytes. In decimal terms an exabyte is a billion gigabytes.
    1. Unstructured data

      Anotações:

      • It is a general label for describing  any corporate information that does not in database. Two types - Textual and Non-textual. 
      • Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. 
      • Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files
      1. Primary goal

        Anotações:

        • Is to discover the repeatable business patterns.
      2. Primary goal

        Anotações:

        • Is to help companies make better business decisions by enabling data scientists and other users to analyze huge volumes of transaction data as well as other data sources that may be left untapped by conventional business intelligence (BI)programs.
        • A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge.  
        • A data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and coding. They have the ability to explain the significance of data in a way that can be easily understood by others. 
        1. Technologies
          1. NoSQL

            Anotações:

            • NoSQL database, also called Not Only SQL, is an approach to data management and database design that's useful for very large sets of distributed data.  
            • NoSQL is especially useful when an enterprise needs to access and analyze massive amounts of unstructured data or data that's stored remotely on multiple virtual servers in the cloud. 
            • the most popular NoSQL database is Apache Cassandra. Cassandra, which was once Facebook’s proprietary database, was released as open source in 2008. Other NoSQL implementations include SimpleDB, Google BigTable, Apache Hadoop, MapReduce, MemcacheDB, and Voldemort. Companies that use NoSQL include NetFlix, LinkedIn andTwitter.
            1. Hadoop

              Anotações:

              •          Hadoop is created by  Doug Cutting  and Mike Cafarella.         It is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.      
              •      It is part of the Apache project sponsored by the Apache Software Foundation.
              1. MapReduce

                Anotações:

                • MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.  It was developed at Google for indexing Web pages and replaced their original indexing algorithms and heuristics in 2004.
                • This framework is divided into two parts :                  1. Map, a function that parcels out work to different nodes in the distributed cluster.                  2. The MapReduce framework is fault-tolerant because each node in the cluster is expected to report back periodically with completed work and status updates. If a node remains silent for longer than the expected interval, a master node makes note and re-assigns the work to other nodes.

              Semelhante

              Analytics Terminology
              vasudha s
              Industrial Data Scientist: The New Limb of Industrial Workforce
              Data science council of America
              A Beginners Guide to Predictive Analytics: Turning Data Into Insights
              Data science council of America
              How To Develop An Impressive Data Analyst Portfolio That Will Get You Hired?
              Data science council of America
              Automated Data Analytics: How, When & Why? 
              Data science council of America
              Why Big Data Automation is Important for Your Business
              Data science council of America
              Matérias para Estudar para o Vestibular
              Alice Sousa
              homonimos e paronimos
              mariana gasco
              Citologia IV (Organelas celulares)
              Luiz Antonio Lopes
              FCE Opposites Practice
              titaleoni