Big Data Analytics

Beschreibung

Mindmap am Big Data Analytics, erstellt von chandrikasweety9 am 02/01/2014.
chandrikasweety9
Mindmap von chandrikasweety9, aktualisiert more than 1 year ago
chandrikasweety9
Erstellt von chandrikasweety9 vor fast 11 Jahre
22
0

Zusammenfassung der Ressource

Big Data Analytics

Anmerkungen:

  • Examining large amounts of variety types of data to uncover hidden  patterns and unknown correlations and useful information.
  1. Big data

    Anmerkungen:

    • General term used to describe the unstructured and semi-structured data.  Data - specify the term is petabyte and exabyte.
    • Petabyte is a measure of memory or storage capacity & is 2 to the 50th power bytes in decimal approximately a thousand terabytes.
    • Exabyte(EB) is a large unit of computer data storage , 2 to the sixtieth power bytes. Approximately one quintillion bytes. In decimal terms an exabyte is a billion gigabytes.
    1. Unstructured data

      Anmerkungen:

      • It is a general label for describing  any corporate information that does not in database. Two types - Textual and Non-textual. 
      • Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. 
      • Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files
      1. Primary goal

        Anmerkungen:

        • Is to discover the repeatable business patterns.
      2. Primary goal

        Anmerkungen:

        • Is to help companies make better business decisions by enabling data scientists and other users to analyze huge volumes of transaction data as well as other data sources that may be left untapped by conventional business intelligence (BI)programs.
        • A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge.  
        • A data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and coding. They have the ability to explain the significance of data in a way that can be easily understood by others. 
        1. Technologies
          1. NoSQL

            Anmerkungen:

            • NoSQL database, also called Not Only SQL, is an approach to data management and database design that's useful for very large sets of distributed data.  
            • NoSQL is especially useful when an enterprise needs to access and analyze massive amounts of unstructured data or data that's stored remotely on multiple virtual servers in the cloud. 
            • the most popular NoSQL database is Apache Cassandra. Cassandra, which was once Facebook’s proprietary database, was released as open source in 2008. Other NoSQL implementations include SimpleDB, Google BigTable, Apache Hadoop, MapReduce, MemcacheDB, and Voldemort. Companies that use NoSQL include NetFlix, LinkedIn andTwitter.
            1. Hadoop

              Anmerkungen:

              •          Hadoop is created by  Doug Cutting  and Mike Cafarella.         It is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.      
              •      It is part of the Apache project sponsored by the Apache Software Foundation.
              1. MapReduce

                Anmerkungen:

                • MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.  It was developed at Google for indexing Web pages and replaced their original indexing algorithms and heuristics in 2004.
                • This framework is divided into two parts :                  1. Map, a function that parcels out work to different nodes in the distributed cluster.                  2. The MapReduce framework is fault-tolerant because each node in the cluster is expected to report back periodically with completed work and status updates. If a node remains silent for longer than the expected interval, a master node makes note and re-assigns the work to other nodes.
              Zusammenfassung anzeigen Zusammenfassung ausblenden

              ähnlicher Inhalt

              Analytics Terminology
              vasudha s
              Industrial Data Scientist: The New Limb of Industrial Workforce
              Data science council of America
              A Beginners Guide to Predictive Analytics: Turning Data Into Insights
              Data science council of America
              How To Develop An Impressive Data Analyst Portfolio That Will Get You Hired?
              Data science council of America
              Automated Data Analytics: How, When & Why? 
              Data science council of America
              Why Big Data Automation is Important for Your Business
              Data science council of America
              QUIZ: Web 2.0 Grundbegriffe - hast du sie drauf?
              Gaby K. Slezák
              Social Media im Unterricht
              Laura Overhoff
              Pädagogik Abitur 2016: Jean Piaget
              Lena S.
              Vetie - MiBi 2013
              Fioras Hu
              AOW-Psychologie SS18
              Anna Huber