Big Data Analytics

Descripción

Mapa Mental sobre Big Data Analytics, creado por chandrikasweety9 el 02/01/2014.
chandrikasweety9
Mapa Mental por chandrikasweety9, actualizado hace más de 1 año
chandrikasweety9
Creado por chandrikasweety9 hace casi 11 años
22
0

Resumen del Recurso

Big Data Analytics

Nota:

  • Examining large amounts of variety types of data to uncover hidden  patterns and unknown correlations and useful information.
  1. Big data

    Nota:

    • General term used to describe the unstructured and semi-structured data.  Data - specify the term is petabyte and exabyte.
    • Petabyte is a measure of memory or storage capacity & is 2 to the 50th power bytes in decimal approximately a thousand terabytes.
    • Exabyte(EB) is a large unit of computer data storage , 2 to the sixtieth power bytes. Approximately one quintillion bytes. In decimal terms an exabyte is a billion gigabytes.
    1. Unstructured data

      Nota:

      • It is a general label for describing  any corporate information that does not in database. Two types - Textual and Non-textual. 
      • Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. 
      • Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files
      1. Primary goal

        Nota:

        • Is to discover the repeatable business patterns.
      2. Primary goal

        Nota:

        • Is to help companies make better business decisions by enabling data scientists and other users to analyze huge volumes of transaction data as well as other data sources that may be left untapped by conventional business intelligence (BI)programs.
        • A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge.  
        • A data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and coding. They have the ability to explain the significance of data in a way that can be easily understood by others. 
        1. Technologies
          1. NoSQL

            Nota:

            • NoSQL database, also called Not Only SQL, is an approach to data management and database design that's useful for very large sets of distributed data.  
            • NoSQL is especially useful when an enterprise needs to access and analyze massive amounts of unstructured data or data that's stored remotely on multiple virtual servers in the cloud. 
            • the most popular NoSQL database is Apache Cassandra. Cassandra, which was once Facebook’s proprietary database, was released as open source in 2008. Other NoSQL implementations include SimpleDB, Google BigTable, Apache Hadoop, MapReduce, MemcacheDB, and Voldemort. Companies that use NoSQL include NetFlix, LinkedIn andTwitter.
            1. Hadoop

              Nota:

              •          Hadoop is created by  Doug Cutting  and Mike Cafarella.         It is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.      
              •      It is part of the Apache project sponsored by the Apache Software Foundation.
              1. MapReduce

                Nota:

                • MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.  It was developed at Google for indexing Web pages and replaced their original indexing algorithms and heuristics in 2004.
                • This framework is divided into two parts :                  1. Map, a function that parcels out work to different nodes in the distributed cluster.                  2. The MapReduce framework is fault-tolerant because each node in the cluster is expected to report back periodically with completed work and status updates. If a node remains silent for longer than the expected interval, a master node makes note and re-assigns the work to other nodes.
              Mostrar resumen completo Ocultar resumen completo

              Similar

              Analytics Terminology
              vasudha s
              Industrial Data Scientist: The New Limb of Industrial Workforce
              Data science council of America
              A Beginners Guide to Predictive Analytics: Turning Data Into Insights
              Data science council of America
              How To Develop An Impressive Data Analyst Portfolio That Will Get You Hired?
              Data science council of America
              Automated Data Analytics: How, When & Why? 
              Data science council of America
              Why Big Data Automation is Important for Your Business
              Data science council of America
              Arte Barroco
              juanmadj
              LAS PLANTAS
              Red Mist
              EJES BÁSICOS DE LA ATENCIÓN A LA PRIMERA INFANCIA DESDE UN ENFOQUE DIFERENCIAL
              maria cely
              DIPTONGO O HIATO
              Silvia Rial Martínez
              UNIDAD 1-2-3. CONSTRUCTOS TEORICOS DEL APRENDIZAJE - PARADIGMAS DEL APRENDIZAJE-PSICOLOGIA Y APRENDIZAJE
              veronica marin herrera