• Developed and Analysed Data warehouse Using SSIS ETL tool, SSDT, SQL server
• Provided Analysed Quarterly Report Using SSRS of Total sales, Total Revenue, Predicted Future sales, topmost selling products, top discounted product.
• Used Performance tuning to fetch rows faster from database and performed data visualization using R-studio and Neo-4j.
1. CONTENTS
1. Introduction.................................................................... Error! Bookmark not defined.
1.2. Reasons for selecting the subject area ....................................................................3
1.3. Vision and Goals ......................................................................................................3
1.4. Key StakeHolders.....................................................................................................3
1.5. Business requirements ............................................................................................4
2. SCHEMA.......................................................................................................................4
3. ETL ..............................................................................................................................12
4. VISUALIZATIONS AND REPORTS .......................... Error! Bookmark not defined.
4.1. Visualizations .........................................................................................................18
4.2. Reports...................................................................................................................22
5. Include XML and Schema ........................................................................................28
6. Graph Databases ......................................................................................................36
6.1. Comaprison to realtional databases.................. Error! Bookmark not defined.
7. Conclusions...................................................................................................................44
8. Bibliography..................................................................................................................45
Appendix A – VISUALIZATIONS Code ......................................................................46
Appendix B – Neo 4J code...........................................................................................51
2. 1. INTRODUCTION
Today’s World is constantly changing at faster rate. When one decides of growing a
business or organization it is really a daunting task. As , enormous efforts are always
required to grow and run business successfully. For any company to flourish, the key
factor is the Customer satisfaction. The company’s ability to move with constant
changes is always tested and such situation creates innovation to the road of success.
Every company has an eye for the higher profit, thus data becomes an integral role.
Daily huge amount of data is created that too in nanoseconds. These data which is
generated is in gigabytes and often in unstructured form also such data is growing at an
exponential rate, and it is becoming uncontrollable. Thus, with growth of data, the
fetching process and studying the data has become a tedious job, when an algorithm is
created the data pattern tends to change. In such cases the accumulation team plays a
keen role and helps in portraying the insight to the head of company, its stakeholders
and business marketing and finance unit, so that they could aid or grow as per the
doings of the company as mentioned before due to increasing data, nothing can stay
constant, hence new algorithms for structuring the data has to be built, so that the
company can emphasize the customer’s needs more effectively.
Business intelligence (BI) is a technology-driven process for analyzing data and
presenting actionable information to help executives, managers and other corporate end
users make informed business decisions. BI consist of a wide variety of tools,
applications and methodologies that enable organizations to collect data from internal
systems and external sources, prepare it for analysis, develop and run queries against
that data, and create reports, dashboards and data visualizations to make the analytical
results available to corporate decision makers, as well as operational workers.
3. 1.2. REASONS FOR SELECTING THE SUBJECT AREA AND DATA
The Flipkart dataset is used in the following assignment is a sample dataset from
SQLSERVER which contains all huge data on . The main target of this assignment is to
create a warehouse (data mart), and submit reports and series of dashboard and well-
defined visualizations and business conditions using ETL processes.
This is a pre-crawled dataset, taken as subset of a bigger dataset (more than 5.8 million
products) that was created by extracting data from a prominent Indian e-commerce
giant Flipkart. This e-commerce dataset contains product listings.
The data has been taken from data.world.
(Source: https://data.world/promptcloud/product-details-on-flipkart-com)
1.3. VISION AND GOALS
In this project we have chosen the Flipkart dataset to represent the Product sales on
their ecommerce website, according to its Category, Brand, Discount price and Time.
Here we are visualizing how every factor in the form of dimension has a direct or
indirect impact on their Product sales.
➢ Goals
To provide better discounts such that more customers can be lured and sales is
improved.
To manage the inventory as per the requirement of the customer's.
To analyze sales of the products depending upon its category and brand for better
marketing advertisements on the website.
1.4. KEY STAKEHOLDERS
• Brand Owner
• Manufacturer
• Customer
• Logistics and Shipping
4. 1.5. BUSINESS REQUIREMENTS
Business requirements focuses on the information needs and to work on any dataset
need to identifying and analyzing data requirements ,What kind of data to be extract
from database and generate desired report according to the business requirement .
Hence we considered below Business Requirements to frame our dataset
1) What are revenue of all brands in 2015 and 2016 ?
2) How many Brands and its product category has the product price greate than 50000?
3) Total number of brands with its Revenue ?
4) Total unit sold and revenue with its product category in both the years?
5) identify quarter wise sales made by top 10 prodcut category across the brand?
5. Tools
A. Data Warehousing Tools
❖ Microsoft SQL Server Management Studio (SSMS)
❖ Microsoft SQL Server Integration Services. (SSIS)
B. Reporting Tool:
❖ Microsoft SQL Server Reporting Services. (SSRS)
C. Visualization Tool:
❖ R studio.
D. Graph data visualization
❖ Neo4j
6. 2. SCHEMA
Dimensional Model
For our dimensional model we have chosen the star schema, because it is easy to generate an
ETL process from it, also in the star schema each dimension table is directly connected to Fact
table. It is looks like a star in which Fact Table act as a pivot as it resides at the center of the
schema, while multiple Dimensions are attached to the fact table in a star like structure having
relationship with all dimensions via Foreign key. Also it contains measurable quantities; these
computational columns help us to analyze the Business profit.
7. Dimensions of the Data Warehouse
Dimension table made up of descriptive columns such as brand_name, category_name etc.
Each dimension has their own primary keys which defines the uniqueness of that dimension
Fact table of the Data Warehouse
Fact table contains the quantitative data, which we are going to store for our dimensions. It is
the central point of the star schema which contains all the primary keys of dimensions and the
measurable quantities.
This fact table designed in such a way that, it gives insights about revenue hierarchy like which
is one is the best-selling brand or which category has the highest revenue, as well as how to
manage the multiple products inventory in a proper way. Moreover, we can also improve sales
by advertising and marketing on the basis of unit sold with respect to their category and brand.
Data Warehouse Design and Architecture:
To do the analysis of this Ecommerce website in different aspects like how much is the Products
revenue generation according to its Category, Brand and Discounted price also amount of
product sold in month, year or on a specific season.
CREATE DATABASE FKART_DW
# Creating Brand Dimension
CREATE TABLE Brand_Dim(
Brand_Key INT NOT NULL IDENTITY PRIMARY KEY,
Brand_ID VARCHAR(10),
Brand_Name VARCHAR(50));
GO
CREATE UNIQUE INDEX B_Index ON Brand_Dim(Brand_Key,Brand_ID)
GO
8. # Creating Category Dimension
CREATE TABLE Category_Dim(
Category_Key INT NOT NULL IDENTITY PRIMARY KEY,
Category_ID INT,
Category_Name VARCHAR(50));
Go
CREATE UNIQUE INDEX C_Index ON Category_Dim(Category_Key,Category_ID)
GO
#Creating Calender Dimension
CREATE TABLE Calender_Dim(
Calender_Key INT NOT NULL IDENTITY PRIMARY KEY,
Calender_ID INT,
Full_Date DATE,
Day_of_Week VARCHAR(20),
Day_of_Month INT,
Month_ INT,
Quarter_ VARCHAR(10),
Year_ INT)
GO
CREATE UNIQUE INDEX D_Index ON Calender_Dim(Calender_Key,Calender_ID)
GO
9. #Creating Order Dimension
CREATE TABLE Order_Dim
(Order_Key INT NOT NULL IDENTITY PRIMARY KEY,
Order_ID VARCHAR(50),
Order_date Date,
URL_ NVARCHAR(255),
Category_Name VARCHAR(150),
Order_Details NVARCHAR(255),
Retail_Cost INT,
Discounted_Cost INT);
GO
CREATE UNIQUE INDEX O_Index ON Order_Dim(Order_Key,Order_ID)
#Creating Fact Dimension
CREATE TABLE Ecom_Fact (
Order_Key INT REFERENCES Order_Dim(Order_Key),
Calender_Key INT REFERENCES Calender_Dim(Calender_Key) ,
Category_Key INT REFERENCES Category_Dim(Category_Key),
Brand_Key INT REFERENCES Brand_Dim(Brand_Key),
Retail_Cost INT
Discounted_Cost INT
CategoryWise_Rev INT,
10.
11.
12. 3. ETL
Making of data warehouse through ETL
In our project we use Microsoft's Sql Server Integration Services i.e. (SSIS) to load our data into
the database.
To achieve this goal we created five SSIS package, ETL is the general procedure for loading
data from one or more sources into a destination, for this we can use any source as well as
destination format like Flat file, excel file, ADO.NET etc.
In this project ETL is applied on four dimensions which are in the CSV format (source), we
extracted these data into the staging table, from staging table data is populated into the
Dimensions table and finally with the help SSIS's lookup tool (join) data is being populated into
the fact table.
This process has been explained below with the screenshots.
Overall ETL Process:
FLAT_FILE SOURCE
STAGING_AREA
d
LOADING OF DATABASE
DIMENSIONS
MAKING OF FACT_TABLE
13. Brand_Dim:
Brand dimension consist of Brand_id, Brand_name and Brand_key. Brand key is the primary
key in this dimension. It is generated when we loading the Brand dimension into our databse by
entering the query Brand_Key INT NOT NULL IDENTITY PRIMARY KEY and then with the help
of advanced editor we enter the sort key position equal to 1, now you might be thinking why I
generated this, as I was already having Brand_id. As the primary key should be unique, i.e none
of the value should be repeated but as the order contains purchase of product which may have
similar brand so it will repeats their id as well and that won’t make the column distinct, so to
remove this redundancy we generated Brand_key as the primary key of this dimension.
Remaing is the Brand_name and Brand_ID which contains the name of brand and id's of the
brand respectively with the help of this we can analyse which one is the highest selling brand.
moreover, we can calculate Brandwise unit sold.
14. Category_Dim:
Category dimension has Category_key as the primary key. Category_id contain id of the
Category similarly, Category_name contain the name of that particular category, using this we
can analyze which is the highest revenue generation category as well as we can calculate
categorywise unit sold.
15. Order_Dim:
Order dimension contain Order_Key as primary key. Order_id is the id of a particular order
similarly, Order_date contains the dates on which order gets executed. Then the URL which
contains order details, this dimension is helpful for seeing the particular order with the price_tag,
image etc. Finally Category_Name, Order_Details i.e Product _name its Retail_Cost and
Discounted_Cost are present into this dimension. With the help of this we can create
hierarchical view of revenue as per the categoty.
16. Calender_Dim:
Calender dimension contain Calender_Key as primary key. Calender_id which is created from
order_date.similarly, Full_Date which contains the dates on which order gets executed,
Day_of_Week i.e the day on which this order gets executed likewise Day_of_Month, Month,
Quarter, Year.This dimension is helpful to calculate Yearly, Monthly and Quarterly revenue.
17. Fact_Dim:
To analyze the insights we have created one fact table which is connected with each dimension
table via foreign key relationship.
Here we have three columns to analyze the sales on this Ecommerce website.
1) CategoryWise_Revenue - It contains the Category wise generated revenue from executed
orders.
2) Brand_Rev- It contains the Brand wise generated revenue from excecuted orders.
3) CategoryUnit_sold- It contains the Category wise unit sold.
18. 4. REPORTS AND VIZUALISATIONS
Rstudio is used to produce Data Visualization.
The FLIPKART data warehouse (Data Mart) is connected to the Rstudio with
RODBC package..
Multiple packages like ggplot2, readr,dplyr,plotxy, ggthemes is used to produce
Visualisation.
4.1. VISUALIZATIONS
Data Visualization is the technique which we used to encoding the information from the
data In visual objects (i.e : points , line , bar , or pie chart) because an object is worth
way more than 100 of words.
In this part of our project I am going to make some quick visualization to drill down the
data from our sales data set. For data visualization I used R studio with visualization
package ggplot2 which I connected to my SQL database with RODBC package.
Key components to consider in our flipkart sales data:
➢ Product category
➢ Product name
➢ Brand name
➢ Total sales ( by category /brand)
➢ Total revenue(by category / brand)
Reason behind the visualization for business requirement:
We can generate revenue by means of category or brand to justify the
company’s profit In long term growth . It is helpful for the company to determine
which product gives them more profit or which one has more demand on sale. In
the other hand We can demonstrate the highest selling product of a specific time
lap (i.e : month , quarter or year).Which can help to maintain the product stock for
customer satisfaction.
19. CATEGORY WISE UNIT SOLD
1. Here in this bar graph top 10 product category are visualize. However the plots
are not similar to Each other. The graph demonstrate that jewelry is in the top
selling category list with a selling count 3521. In the other hand kitchen and tools
category are comes in minimum selling product with a count 326 and 386
respectively.
20. HIRARCHY OF BRAND REVENUE
2. In this graph all the information describe about brand revenue here Karacraft
brand has highest revenue followed by radiant bay. By exploring this visualization
company can prepare Their future agenda for the development.
21. Jewellery = 21.48
Home decor= 16.12
Automotive = 2.14
mobiles=23.15
clothing=7.12
3. This pie chart describes about the proportion of product sold in the year 2015 In
the year 2015 the highest sold product is jewelry. and the lowest one is home
decor product. With discussing about the benefits of this visualization, a company
can predict the lowest selling Product from the product list . so company can add
some clearance offer or adapt other important strategy to attract new customer
for those specific products.
22. 4.2. REPORTS
We use SSRS as the tool for producing reports.and displaying our results on our
business question.
The below report illustrates hierarchical distribution of the brand revenue according to its Brand
name.
BrandWise_Revenue
Brand Name
Brand Rev
(Rupee)
Karatcraft 5632898
Radiant Bay 3601945
BlueStone 2796709
Durian 1480125
ARRA 973097
Rakam 774110
Jewels5 713743
Fullcutdiamond 675287
Allure Auto 663819
Shashvat Jewels 544387
Diti Jewellery 374850
GAGA 370060
WearYourShine by PCJ 324096
JacknJewel 254698
Raymond 254438
Slim 244008
DailyObjects 242640
P.N.Gadgil Jewellers 208079
23. Highest_Sold_Order
Brand
Name
Category Name Order date
Discounted
Cost
Product
Price
ARRA Furniture 12/31/2015 12:00:00
AM
57500 57500
51400 51400
61800 61800
53300 53300
68400 68400
65900 65900
86500 86500
Audeze Mobiles &
Accessories
3/11/2016 12:00:00 AM 116292 116292
Durian Furniture 12/31/2015 12:00:00
AM
36660 56400
47775 73500
105300 162000
70200 108000
54795 84300
47970 73800
60840 93600
55575 85500
70785 108900
141375 217500
45045 69300
132990 204600
162825 250500
35295 54300
48945 75300
NITGEN Pens & Stationery 3/20/2016 12:00:00 AM 44804 71687
36575 58520
The above report contains product category along with his order for the product price greater
than 50,000
24. The below report described the information about quarterly revenue generated by category in
the span of year 2015 to 2016.
Quarterly_Revenue(Cat_wise)
Category
Name
Year Quarter
Quat Wise
Rev
Automotive 2015 Q4 886717
2016 Q1 180412
Q2 120520
Baby Care 2015 Q4 219256
2016 Q1 52669
Q2 123318
Clothing 2015 Q4 290292
2016 Q1 203585
Q2 543949
Computers 2015 Q4 834349
2016 Q1 311177
Q2 47208
Furniture 2015 Q4 2592373
2016 Q1 149487
Q2 337846
Home Decor &
Festive Needs
2015 Q4 652441
2016 Q1 876731
Q2 116100
Home
Furnishing
2015 Q4 562086
2016 Q1 258211
Q2 42985
Jewellery 2015 Q4 1416970
2016 Q1 18040871
Q2 117968
Mobiles &
Accessories
2015 Q4 212633
2016 Q1 805051
Q2 32547
Tools &
Hardware
2015 Q4 14131
2016 Q1 310358
Q2 9700
25. Yearly_Category_Revenue
Category Name Year Yearly Revenue
Automation & Robotics 2016 17000
Automotive 2015 886717
2016 300932
Baby Care 2015 219256
2016 175987
Bags 2016 184939
Beauty and Personal Care 2015 1687
2016 176954
Cameras & Accessories 2015 72329
2016 21108
Clothing 2015 290292
2016 747534
Computers 2015 834349
2016 358385
Eyewear 2016 12253
Food & Nutrition 2016 1955
Footwear 2016 129603
Furniture 2015 2592373
2016 487333
Gaming 2016 30714
Health & Personal Care
Appliances
2016 139669
Home & Kitchen 2015 4348
2016 79113
Home Decor & Festive Needs 2015 652441
2016 992831
Home Entertainment 2015 25620
26. The below report described
category wise revenue generated
in the year 2015 to 2016.
CatWise_UnitSold&Revenue
Category Name
Category Wise
Rev
Category Unit sold
Jewellery 19575809 3521
Furniture 3079706 179
Home Decor & Festive Needs 1645272 859
Computers 1192734 572
Automotive 1187649 1002
Mobiles & Accessories 1050231 1097
Clothing 1037826 887
Home Furnishing 863282 699
Baby Care 395243 455
Tools & Hardware 334189 386
Kitchen & Dining 297572 362
Toys & School Supplies 211869 101
Pens & Stationery 194837 173
Bags 184939 151
Beauty and Personal Care 178641 154
Health & Personal Care
Appliances
139669 43
Footwear 129603 191
Sports & Fitness 128618 107
Cameras & Accessories 93437 72
Home Improvement 87456 78
Home & Kitchen 83461 24
Home Entertainment 56301 19
Gaming 30714 35
Watches 24628 48
2016 30681
27. Automation & Robotics 17000 1
Eyewear 12253 10
Pet Supplies 12194 29
Sunglasses 10911 22
Food & Nutrition 1955 1
Household Supplies 1917 4
Wearable Smart Devices 978 2
The above report is the combined report which shows the category wise revenue along with its
unit sold.
28. 5. XML AND SCHEMA
1.a. XML of Brand Dimension
1.b XSD document of Brand Dimension
36. 6. GRAPH DATABASES
• Neo4j is often called as Graph database. Graphs are structures containing
vertices (denoted as entities) and edges (denotes connections between vertices).
• Neo4j permits storing data as a key value pairs that is, its properties can have
any value as string, number or Boolean.
• Graph Database most of the time are schema less, which allows flexibility of a
document or key/value Store database. Moreover, it supports Relationships in
similar manner as that of traditional Relational Database.
• Below are the graph and code written to load the dataset in Neo4j. We have
loaded the csv file of the required tables and match them according to our
corresponding data table.
We have first created nodes which are the required table in Neo4j, create constraints
and match them to the required table.
❖ Load Brand Dimension
42. NEO4J VS RELATIONAL DATABASE
Neo4j code to create relation (join) between Brand-Order –Fact
Match(p:Fact_table),(b:Brand),(o:Od) where p.Brand_ID=b.Brand_ID and
p.Od_ID=o.Od_ID return p,b,o
43. SQL Query
The above SQL query demonstrates Brand_ID ,Order_ID and its Discounted_Cost , it is
observed that output in SQL is not graphical whereas it is quite interactive and attractive
in Neo4j.
Neo4j code looks much simple and easy to understand, whereas the code for SQL is
tedious and requires time to understand on how to relate a table or column. The
output for both is very different, Neo4j looks more presentable whereas only rows are
seen in SQL.
In Neo4j we can retrieve data quickly by accessing the respective nodes, whereas in
relational database we retrieve data by accessing select query on table.
While implementing we observed that cypher query in Neo4j are easier to work with in
comparison with SQL. As, in Neo4j relationships can be formed using constraints alone
rather in relational database consist of different concepts like foreign keys, surrogate
keys etc.
44. 7. CONCLUSIONS
• In this project, we have concentrated briefly at the order transaction on
Ecommerce Website. Then, we extracted some of the major components by
cleaning the dataset.
• We analyze the Flipkart (Ecommerce Website) order transaction data in the year
2015 to2016, and find out, hierarchy of revenue generated by the brand as well
as product category.
• We visualized how every factor in the form of dimension has a direct or indirect
impact on their Product Sales.
46. APPENDIX A – VISUALIZATIONS CODE
install.packages("RODBC")
library(RODBC)
library(ggplot2)
library(dplyr)
myconn <- odbcDriverConnect(connection = "Driver={SQL
Server};server=SHREEM;database=FKARTDW;trusted_connection=yes;")
myconn
######################################################################
# To get category wise Unit Sold (Bar_Chart)
rd <- sqlQuery(myconn,"
select distinct(b.Category_Name),a.CategoryWise_Rev,a.CategoryUnit_sold from
Ecom_Fact a, Category_Dim b, Calender_Dim c
where a.Category_Key=b.Category_Key
and a.Calender_Key=c.Calender_Key
order by a.CategoryWise_Rev desc")
rd
dt<- rd %>% top_n(10)
ggplot(dt,aes(Category_Name,CategoryUnit_sold,label=CategoryUnit_sold,color="red"))
+ geom_bar(stat="identity")+ geom_text(vjust=2)
47. # To get the Brand wise revenue (Bar_Chart)
rvn<- sqlQuery(myconn,"select distinct(b.Brand_Name),a.Brand_Rev from Ecom_Fact
a , Brand_Dim b
where a.Brand_Key=b.Brand_Key
order by Brand_Rev desc")
rvn
bo<-rvn%>%top_n(10)
barplot(bo$Brand_Rev,main ="Brand Wise Revenue",
xlab = "revenue",ylab = "brand", names.arg=bo$Brand_Name,
col = rainbow(length(unique(bo$Brand_Name))),
legend.text = unique(bo$Brand_Name),
args.legend = list(horiz=TRUE,x="topleft"))
######################################################################
48. # To get category wise Unit Sold (Bar_Chart)
tu<-sqlQuery(myconn,"select
distinct(b.Category_Name),a.CategoryWise_Rev,a.CategoryUnit_sold from Ecom_Fact
a, Category_Dim b, Calender_Dim c
where a.Category_Key=b.Category_Key
and a.Calender_Key=c.Calender_Key
order by a.CategoryWise_Rev desc")
tu
to<- tu%>%select(Category_Name,CategoryUnit_sold)%>%top_n(5)
to
slices<- c(to$CategoryUnit_sold)
lbls<-c(to$Category_Name)
pct<- round(slices/sum(slices)*100)
lbls<- paste(lbls,pct)
lbls <- paste(lbls,"%",sep = "")
pie(slices,labels = lbls,col = rainbow(length(lbls)),
main="pie chart of unit sold")
######################################################################
49. SSRS Quries
# Brand wise revenue in the year 2015 and 2016
select distinct(b.Brand_Name),a.Brand_Rev from Ecom_Fact a , Brand_Dim b
where a.Brand_Key=b.Brand_Key
order by Brand_Rev desc
# Total Unit sold and Revenue for all Categories in 2015 and 2016
select distinct(b.Category_Name),a.CategoryWise_Rev,a.CategoryUnit_sold from
Ecom_Fact a, Category_Dim b, Calender_Dim c
where a.Category_Key=b.Category_Key
and a.Calender_Key=c.Calender_Key
order by a.CategoryWise_Rev desc
#Year wise Category Revenue
select b.Category_Name,c.Year_,SUM(a.Discounted_Cost) yearWise_CatRev from
Ecom_Fact a, Category_Dim b, Calender_Dim c
where a.Category_Key=b.Category_Key
and a.Calender_Key=c.Calender_Key
group by b.Category_Name,c.Year_
order by b.Category_Name
##Weeek wise revenue of all Categories in the Year 2015 and 2016
50. select c.Category_Name,b.Year_,b.Day_of_Week,SUM(a.Discounted_Cost) AS
WeekWise_JewlREv from Ecom_Fact a, Calender_Dim b, Category_Dim c
where a.Calender_Key=b.Calender_Key
and a.Category_Key=c.Category_Key
and c.Category_ID in( select Category_ID from Calender_Dim )
group by c.Category_Name,b.Year_,b.Day_of_Week
order by c.Category_Name desc
# Quarter_ wise Revenue of top_10 Category in the Yr 2015 and 2016
select c.Category_Name,b.Year_,b.Quarter_,SUM(a.Discounted_Cost) AS
QuatWise_Rev from Ecom_Fact a, Calender_Dim b, Category_Dim c
where a.Calender_Key=b.Calender_Key
and a.Category_Key=c.Category_Key
and c.Category_ID in
(10015,10001,10006,10020,10007,10012,10004,10010,10011,10008)
group by c.Category_Name,b.Year_,b.Quarter_
order by c.Category_Name desc
Highest_Sold_Order
# which Brand has got the order for Product whose retail cost is more than 50000
select b.Brand_Name,c.Category_Name,c.Order_date,c.Order_Details,
c.Discounted_Cost ,MAX(a.Retail_Cost) AS Product_Price
from Ecom_Fact a, Brand_Dim b, Order_Dim c
51. where a.Brand_Key=b.Brand_Key
and a.Order_Key=c.Order_Key
group by
b.Brand_Name,c.Category_Name,c.Order_date,c.Order_Details,c.Discounted_Cost
having MAX(a.Retail_Cost) > 50000
order by Product_Price Desc
APPENDIX B – NEO 4J CODE
Queries for Neo4j
❖ Load Brand Dimension:
LOAD CSV WITH HEADERS FROM "file:///Brand_dim.csv" as row create(b:Brand) set
b=row{Brand_ID:row.Brand_ID,Brand_Name:row.Brand_Name} return b
Create CONSTRAINT on (b:brand) Assert b.Brand_ID IS UNIQUE
❖ Load Calendar Dimension
LOAD CSV WITH HEADERS FROM "file:///Calender_dim.csv" as row create(d:Date)
set
d=row{Calender_ID:row.Calender_ID,Full_Date:row.Full_Date,Day_of_Week:row.Day_
of_Week,Month_:row.Month_,Quarter_:row.Quarter_,Year_:row.Year_} return d
Create CONSTRAINT on (d:Date) Assert d.Calender_ID IS UNIQUE
❖ Load category Dimension
52. LOAD CSV WITH HEADERS FROM "file:///Category_dim.csv" as row
create(c:Category) set
c=row{Category_ID:row.Category_ID,Category_Name:row.Category_Name}return c
Create CONSTRAINT on (c:Category) Assert c.Category_ID IS UNIQUE
❖ Load Fact Dimension
LOAD CSV WITH HEADERS FROM "file:///Fact_dim.csv" as row CREATE(f:Fact_table)
SET
f=row{Od_ID:row.Od_ID,Calender_ID:row.Calender_ID,Category_ID:row.Category_ID,
Brand_ID:row.Brand_ID,Retail_Cost:row.Retail_Cost,Discounted_Cost:row.Discounted
_Cost,Brand_Rev:row.Brand_Rev,CategoryUnit_sold:row.CategoryUnit_sold}
❖ Load Order Dimension
LOAD CSV WITH HEADERS FROM "file:///Order_dim.csv" as row create(o:Od) set
o=row{Od_ID:row.Od_ID,
Od_date:row.Od_date,URL_:row.URL_,Category_Name:row.Category_Name,Od_Detai
ls:row.Od_Details,Retail_Cost:row.Retail_Cost,Discounted_Cost:row.Discounted_Cost}
return o
Create CONSTRAINT on (o:Od) Assert o.Od_ID IS UNIQUE
Relationship queries
❖ Connect Category Dimension to Fact Dimension
match(s:Category),(p:Fact_table) where s.Category_ID=p.Category_ID create(s)-
[r:Category_Name]- >(p) return s,p,r
❖ Connect Calendar Dimension to Fact Dimension
match(s:Calender),(p:Fact_table) where s.Calender_ID=p.Calender_ID create(s)-
[r:Quarter_]- >(p) return s,p,r
❖ Connect Order Dimension with Fact Dimension
match(s:Od),(p:Fact_table) where s.Od_ID=p.Od_ID create(s)- [r:Retail_Cost]- >(p)
return s,p,r
❖ Query to find count of brand name
match(n:Brand) Return count(n.Brand_Name) return n
53. ❖ Query to find brand_name whose name starts with R.
match (c:Brand) where c.Brand_Name starts with "R" return c
❖ Interconnect query connect Order_Brand_Fact
Match(p:Fact_table),(b:Brand),(o:Od) where p.Brand_ID=b.Brand_ID and
p.Od_ID=o.Od_ID return p,b,o