RSS

Tag Archives: CIS8008

CIS8008, Semester 2 2017 Assignment 2 Specifications

CIS8008, Semester 2 2017 Assignment 2 Specifications
University of Southern Queensland © 2017 1

Description Possible Marks and Wtg(%) Word
Count
Due date
Assignment 2 Written Practical Report 100 marks 15% Weighting 2000 14/08/17

Learning objectives
The key frameworks and concepts covered in modules 1–6 are particularly relevant for this
assignment. Assignment 2 relates to the course learning objectives 1, 2 and 4:
1. demonstrate applied knowledge of people, markets, finances, technology and management
in a global context of business intelligence practice (data warehouse design, data mining
process, data visualisation and performance management) and resulting organisational
change and how these apply to implementation of business intelligence in organisation
systems and business processes
2. identify and solve complex organisational problems creatively and practically through the use
of business intelligence and critically reflect on how evidence based decision making and
sustainable business performance management can effectively address real world problems
4. demonstrate the ability to communicate effectively in a clear and concise manner in written
report style for senior management with correct and appropriate acknowledgment of main
ideas presented and discussed.
Software Requirements
You must use RapidMiner Studio for Task 2 and you must use Tableau Desktop for Task
3
in this Assignment 2. Failure to do so may result in Task 2 and/or Task 3 not being marked
and zero marks awarded.
Academic integrity
Note carefully University policy on Academic Misconduct such as plagiarism, collusion and
cheating.
Your Assignment 2 submission is automatically submitted to and checked in
Turnitin for academic integrity
when you submit your Assignment 2 via the course study
Assignment 2 submission link
. If any of these occur they will be found and dealt with by the
USQ Academic Integrity Procedures. If proven, Academic Misconduct may result in failure
of an individual assessment, the entire course or exclusion from a University program or
programs.
This assignments should be completed by each student individually (not in group).
Late submission policy
USQ’s Assessment – Assignment (Late Submission) and Compassionate and Compelling
Circumstances procedures relate to Extensions and Late Assignments. They can be found under
the following links:
Assessment – Assignment (Late Submission) Procedure
Assessment of Compassionate and Compelling Circumstances Procedures:
An Assignment submitted after the due date without an approved extension of time will be
penalised. The penalty for late submission is a
reduction by five percent (5%) of the maximum
mark
applicable for the assignment, for each University Business Day or part Business Day that
the assignment is late.

CIS8008, Semester 2 2017 Assignment 2 Specifications
University of Southern Queensland © 2017 2
Assignment 2 consists of three main tasks and a number of sub tasks
Task 1 Data Management Architectures (Worth 30 Marks)
For Task 1 you are required to conduct a critical literature review of the related concepts of data
warehouses, data lakes and data marts in order to complete two sub tasks.
Task 1.1) Define and discuss the concepts of (1) a data warehouse, (2) a data lake and (3) a data
mart drawing relevant and reputable literature (about 500 words).
Task 1.2) Explain how a data warehouse, a data lake and a data mart would be used in an
organization and in your answer provide some real world examples of how each would be used in
an organization (about 500 words).
Task 2 Exploratory Data Analysis and Linear Regression Analysis (Worth 40
Marks)
For Task 2 consider a set of observations on a large number of white wine varieties involving
their chemical properties and ranking by wine tasters contained in
white-wines.csv data set. Wine
industry has been growing steadily as social drinking of wine is on the rise. The price of a wine
largely depends on wine appreciation by wine tasters which may have a high degree of variability.
Another key factor in wine certification and quality assessment is physicochemical tests which
are laboratory-based and take into account factors like acidity, pH level, presence of sugar and
other chemical properties.
For wine producers, it would be of interest if wine tasters
perception of wine quality after tasting
can be related to the chemical properties of wine so that certification and quality assessment and
assurance process of wines is more rigorous.
The
white-wines.csv data set consists of 4898 white wine varieties in total (records). All wines are
from one wine producing region. The
white-wines.csv data set was collected on 12 different
properties of wines. Quality is based on sensory data (wine tasters
perception of the quality of a
wine), the rest are based on chemical properties of wines including density, acidity, alcohol
content etc. All chemical properties of wines are coded as continuous numeric variables. Quality
is an ordinal variable with a possible ranking from 1 (worst) to 10 (best). Each white wine variety
is tasted by three independent tasters and final rank assigned is the median rank given by tasters.
See Table 1 White Wines Data Set Data Dictionary for full details of
white-wines.csv data set.
Table 1 White Wines Data Set Data Dictionary

Variable Name Type and description of variable Range of values
fixed acidity Real – tartaric acid grams/litre – most acids in wines are fixed 3 through to 15
volatile acidity Real – acetic acid grams/litre – steam distillable acids 0 through to 1.2
citric acid Real – grams/litre commercially produced acid supplement 0 through to 2
residual sugar
2 – volatile acidity
3 – citric acid
4 – residual sugar
5 – chlorides
6 – free sulfur
dioxide
7 – total sulfur
dioxide
8 – density
9 – pH
10 – sulphates
11 – alcohol
Real – measured in grams of sugar per litre of wine 0 through to 70
chlorides Real – sodium chloride – grams/litre 0 through to 1
free sulfur dioxide Integer fruit preservative used in wines milligrams/litre 1 through to 300
total sulfur dioxide Integer fruit preservative used in wines milligrams/litre 1 through to 450
density Real – density – grams/cubic 0 through to 2
pH Real – strength of acidity 2 through to 4
sulphates Real – potassium sulfate – grams/litre 0 through to 2
alcohol Real percentage of volume – alcohol content of a wine 3 through to 15
quality Integer – Quality Sensory score by wine tasters between 0 and 10

CIS8008, Semester 2 2017 Assignment 2 Specifications
University of Southern Queensland © 2017 3
Note: You will need to conduct some desktop research on wine tasting and the chemical
properties of wine in order to fully understand and interpret the key findings of the exploratory
data analysis (EDA) of the
white-wines.csv data set and the key results of the Linear Regression
Models for the
white-wines.csv data set. The following resources: Wikipedia page on Acidity in
Wines
and Whats in a Wine? are a good starting point.
Task 2.1) Conduct an exploratory data analysis of the white-wines.csv data set using the
RapidMiner Studio data mining tool
. Summarise the findings of your exploratory data
analysis
in terms of describing key characteristics of each variable in the wines.csv data set such
as maximum, minimum values, average, standard deviation, most frequent values (mode),
missing values and invalid values etc and relationships of variables with other variables if relevant
in a table named
Table 2.1 Results of Exploratory Data Analysis for the White-Wines.csv
Data Set
.
Hint: The Statistics Tab and the Chart Tab in RapidMiner provide descriptive statistical information
and the ability to create useful charts like Barcharts, Scatterplots etc. You might also like to look at
running some correlations and chi square tests on the
white-wines.csv data set to indicate which
variables you consider to be the top five key variables
and which contribute most to confirming
a wine taster’s ranking of the quality of a wine. Note in completing Task 2.1 you will find it useful
to refer to the data dictionary for the
white-wines.csv data set included in this document which
defines each of the variables in terms of their data type and range of p o s s i b l e values
.
Discuss the key results of your exploratory data analysis presented in Table 2.1 and provide a
rationale for why you have selected your five top variables for predicting a wine taster
s ranking of
a white wine drawing on the results of your EDA analysis and relevant literature (About 500 words).
Task 2.2) Build a Linear Regression model for predicting the quality ranking of a white wine using
a RapidMiner data mining process and an appropriate set of data mining operators and a reduced
set of variables from the
white-wines.csv data set determined by your exploratory data analysis in
Task 2.1. Provide these outputs from RapidMiner (1) Final Linear Regression Model process and
(2) Summary Table of Results of Final Linear Regression Model for Task 2.2 for
white-wines.csv
data set.
Briefly describe your final Linear Regression Model Process, and discuss the results of the
Final Linear Regression Model for white wine.csv data set
drawing on the key outputs
(coefficient, standardised coefficients, t-statistics values, p-values and significance levels etc) for
predicting Wine Quality and relevant supporting literature on the interpretation of a Linear
Regression Model (About 500 words).
Include all appropriate RapidMiner outputs such as RapidMiner Processes, Graphs and Tables that
support the key aspects of your exploratory data analysis and linear regression model analysis of
the
white-wines.csv data set in your Assignment 2 report. Note you need export these outputs
from RapidMiner using the File/Print/Export Image option and include in Task 2 where
relevant or in Appendix A of Assignment 2 report.

CIS8008, Semester 2 2017 Assignment 2 Specifications
University of Southern Queensland © 2017 4
Task 3 Tableau Desktop View of Snow Falls at Whistler over Time (worth 20
marks)
Create a Tableau Text Table or Graph view that displays Snow falls at Whistler BC Canada and
other relevant data over time (years) using
whistler-daily-snowfall.csv. Comment on the (1) process
of preparing this data set for a Text Table or Graph view presentation using Tableau Desktop and
(2) key trends and patterns that are apparent in Text Table/Graph you created for presentation of
the
whistler-daily-snowfall.csv data set (about 200 words).
Whistler Daily Snow Fall Data Set contains: Historical daily snowfall data record at Whistler, BC,
Canada over the period July 1 1972 to December 31 2009. Measured at top of Whistler Gondola:
Latitude: 50
°04’04.000″ N Longitude: 122°56’50.000″ W Elevation: 1835.00 m. Data Source:
http://www.climate.weatheroffice.ec.gc.ca/ (Whistler Roundhouse station, identifier 1108906).
Note you will need to ensure that Tableau Desktop recognizes the variables in the whistler-dailysnowfall.csv data set as correct type and Note there are a lot of missing values (coded as nan) or
null values in this data set for the earlier years 1970s etc.
Table 2 Whistler Daily Snow Fall Data Set Data Dictionary

Variable Name Type and description of variable Range of values
Year Integer – Years for snow fall recorded at Whistler BC Canada 1972 to 2009
Month Integer – Months January through December for each year 1 through to 12
Day Integer – Days of each month of each year 1 through to 31
MaxTemp
2 – volatile
acidity
3 – citric acid
4 – residual
sugar
5 – chlorides
6 – free sulfur
dioxide
7 – total sulfur
dioxide
8 – density
9 – pH
10 – sulphates
11 – alcohol
Output variable
(based on sensory
data):
12 – quality
(score between 0
and 10)
Real – measured in Celsius
MinTemp Real – measured in Celsius
MeanTemp Real – measured in Celsius
HeatDegDays Real – measured in Celsius
CoolDegDays Real – measured in Celsius
TotalRain Real – rainfall is measured in mm
TotalSnow Real – snow is measured in cm
TotalPrecip Real total precipitation is measured in mm
SnowOnGrd Real snow on ground is measured in cm

Report presentation, writing style and referencing (worth 10 marks)
Your Assignment 2 must be presented in report format, written in an appropriate and critical style
and supported where required with in text references using Harvard Referencing Style
Your assignment 2 report must be structured in report format as follows:
Cover Page: Including course code/name, Student name & ID, assignment title (e.g. “Assignment 2)
Table of Contents
Body of report
Task 1
1.1

1.2
Task 2

2.1
2.2
Task 3
List of References
Appendices

CIS8008, Semester 2 2017 Assignment 2 Specifications
University of Southern Queensland © 2017 5
You must submit two files for Assignment 2:
1. Assignment 2 Report for Tasks 1, 2 and 3 in Word document format with extension
.docx
2. Tableau packaged workbook with the extension .twbx contains required Text Table
/ Graph view for Task 3
You must the following file naming convention:
1. Student ID_Student name_CIS8008_Ass2.docx
2. Student ID_ Student name_CIS8008_Ass2.twbx
Harvard referencing resources
Install a bibliography referencing tool – Endnote which integrates with your word processor.
http://www.usq.edu.au/library/referencing/endnote-bibliographic-software or alternatively use an
online citation tool such as
Zetoro or You Cite This For Me
USQ Library how to reference correctly using the Harvard referencing system
https://www.usq.edu.au/library/referencing/harvard-agps-referencing-guide

 
Leave a comment

Posted by on August 9, 2017 in academic writing, Academic Writing

 

Tags: , ,