Skip to main content

Post #6: Traditional Statistics: Descriptive and Inferential in Big Data

Two types of Traditional statistics in Big data include Descriptive and Inferential. Descriptive means averages and working on sets of numbers. Descriptive statistics is a type of statistic in which a data set is summarised and the characteristics are described. This descriptive data is usually displayed through the use of tables, charts etc. but is most commonly reported as a measure of a central tendency. A central tendency is a typical value for a distribution, it is also been known to be called a location or centre of the distribution. The arithmetic mean is the most common measure of a central tendency, this is the median and the mode. The mean is the average of all the values, the median is the exact middle of the data set while the mode is the most frequent value in the data set. 

The goal of traditional statistics is analysing and summarising data, providing tight assumptions about the problem and data distributions as well as using conservative techniques and approaches.  

Comments

Popular posts from this blog

FutureLearn Week 2: Post 1 of 4

Open data has been increasing for some time now with data being made open on various sites globally. There are many advantages to having open data, these advantages include being able to share public data sets so that they can be compared. These open data sources can also be used for environmental purposes or even health issues. Disadvantages of open data would include the fact that the site providing the data would be inherently biased and formed in the opinion of the creator.

Post #3: Growth of Big Data

There was an incredible amount of internet growth in the 1990s, and personal computers became steadily more powerful and more flexible. Internet growth was based both on Tim Berners-Lee’s efforts, CERN’s free access, and access to individual personal computers. In 2005, Big Data, which had been used without a name, was labelled by Roger Mougalas. He was referring to a large set of data that, at the time, was almost impossible to manage and process using the traditional business intelligence tools available. Additionally, Hadoop, which could handle Big Data, was created in 2005. Hadoop was based on an open-sourced software framework called Nutch, and was merged with Google’s MapReduce. Hadoop is an Open Source software framework, and can process structured and unstructured data, from almost all digital sources. Because of this flexibility, Hadoop (and its sibling frameworks) can process Big Data. Big Data is revolutionising entire industries and changing hum...

Post 11: (Question 7) Limitations of traditional data analysis

As with all things there will always come limitations to data analysis due to the fact that it is created by humans and is subsequently subject to human error. Some of the limitations that you may come across would be that the data may be incomplete, whether it be missing values, or lack of a section of necessary data. This could severely limit the data's usability. Survey data can also be scrutinised due to the human component. People do not always provide accurate information through surveys and many are likely to not answer truthfully. For example if a person were asked how much alcohol they consume within a week they are likely to say less than their actual intake.