Two of the biggest challenges of big data is Analysing and Visualising the data.
Firstly with analysing the data, the size of big data files can sometimes be substantial, there are many things that must be considered before downloading the data, for example the file size, how long the data file will take to download, will all of it be necessary or will part of the file suffice and is there enough storage space within the system itself. Visualisation is way to represent the data in a way that is easier to understand such as word clouds and things of the like. This will aid users in seeing the prominent and key terms from the analysis of the data sets.
The first step after downloading the data would be to quality check it to ensure that each field had the appropriate data types in each field and to ensure that the user understood the meaning of each field.
Keeping a copy of the original data would be essential as well as each documented version change for each stage of visualisation. This version documentation would allow the users to create working copies from the original as a key.
Databases, programs or data warehousing platforms can be used as alternatives to spreadsheets for analysing data sets.
Predictive analysis would be the most appropriate for of analysis in this particular case to analyse energy consumption over time in relation to historical data provided by the government, and the use of comparative analysis in relation to comfortable/affluent acorn group.
Metadata otherwise known as data that describes other data includes a vast range of information that can include who recorded the data, why the data was recorded, what units the data is in, or it could cover what copyright that applies to the use of the data. Metadata is a vital resource to data scientists who more often than not require more data to explore and compare with other data sets, often using data that other people have created or produced.
Firstly with analysing the data, the size of big data files can sometimes be substantial, there are many things that must be considered before downloading the data, for example the file size, how long the data file will take to download, will all of it be necessary or will part of the file suffice and is there enough storage space within the system itself. Visualisation is way to represent the data in a way that is easier to understand such as word clouds and things of the like. This will aid users in seeing the prominent and key terms from the analysis of the data sets.
The first step after downloading the data would be to quality check it to ensure that each field had the appropriate data types in each field and to ensure that the user understood the meaning of each field.
Keeping a copy of the original data would be essential as well as each documented version change for each stage of visualisation. This version documentation would allow the users to create working copies from the original as a key.
Databases, programs or data warehousing platforms can be used as alternatives to spreadsheets for analysing data sets.
Predictive analysis would be the most appropriate for of analysis in this particular case to analyse energy consumption over time in relation to historical data provided by the government, and the use of comparative analysis in relation to comfortable/affluent acorn group.
Metadata otherwise known as data that describes other data includes a vast range of information that can include who recorded the data, why the data was recorded, what units the data is in, or it could cover what copyright that applies to the use of the data. Metadata is a vital resource to data scientists who more often than not require more data to explore and compare with other data sets, often using data that other people have created or produced.
Comments
Post a Comment