Note: This is just a reference paper which you can go through, we are facing some issue with the website. If you have any more important question/answer, let us know.
Share it on our Email  1trickyworld1@gmail.com
Question:
For the following vectors x and y, calculate the cosine similarity and euclidean distance measures:
x =(4,4,4,4), y=(2,2,2,2)
Solution:
Cosine
x ● y = 4*2 + 4*2 + 4*2 + 4*2 = 32
x = sqrt(4*4 + 4*4 + 4*4 + 4*4) = sqrt (64) = 8
y = sqrt(2*2 + 2*2 + 2*2 + 2*2) = sqrt (16) = 4
cos(x,y) = (x ● y) / (x*y) = (32)/ (8*4)
cos(x,y) = 1
x ● y = 4*2 + 4*2 + 4*2 + 4*2 = 32
x = sqrt(4*4 + 4*4 + 4*4 + 4*4) = sqrt (64) = 8
y = sqrt(2*2 + 2*2 + 2*2 + 2*2) = sqrt (16) = 4
cos(x,y) = (x ● y) / (x*y) = (32)/ (8*4)
cos(x,y) = 1
Euclidean
d(x, y) = sqrt((42)^2 + (42)^2 + (42)^2 + (42)^2)
Euclidean distance = 4
d(x, y) = sqrt((42)^2 + (42)^2 + (42)^2 + (42)^2)
Euclidean distance = 4
Question:
Consider the onedimensional data set shown on the below table
X

0.6

3.2

4.5

4.6

4.9

5.2

5.6

5.8

7.1

9.5

Y





+

+

+





+





Classify the data point x=5.0 according to its 3 and 9 nearest neighbors (Using majority Vote)
Answer:
We need to first find the
difference of each data set with respect to x=5.0, Refer the below table for
the same.
x

X

Difference (x & X)

Y

5.0

0.6

4.4

−

5.0

3.2

1.8

−

5.0

4.5

0.5

+

5.0

4.6

0.4

+

5.0

4.9

0.1

+

5.0

5.2

0.2

−

5.0

5.6

0.6

−

5.0

5.8

0.8

+

5.0

7.1

2.1

−

5.0

9.5

4.5

−

As asked,
Using 3 nearest neighbors method,
3 Closest points to the point x=5.0 will be the one who has least difference
among them  > 4.9, 5.2, 4.6
Classes > + − +
Using Majority Vote, 3nearest
neighbor: +
Using 9 nearest neighbors method,
9 Closest points to the point x=5.0 will be the one who has least difference
among them  > 4.9, 5.2, 4.6, 4.5, 5.6, 5.8, 3.2, 7.1, 0.6
Classes > + − + + − + − − −
Using Majority Vote, 9nearest
neighbor: −Question:
Suppose a group of 12 sales price records has been sorted as follows:
5; 10; 11; 13; 15; 35; 50; 55; 72; 90; 204; 215:
Partition them into three bins by each of the following methods.
(a) equalfrequency partitioning
(b) equalwidth partitioning
(c) clustering
Answer:
(a) equalfrequency (equidepth) partitioning:
Partition the data into equidepth bins of depth 4: [given as n=4]
Bin 1: 5, 10, 11, 13
Bin 2: 15, 35, 50, 55
Bin 3: 72, 90, 204, 215
(b) equalwidth partitioning:
Partitioning the data into 3 equiwidth bins will require the width to be (215−5)/3 = 70.
We get interval like (1,70),(71,140),(141,210),(211,280)
Bin 1: 5, 10, 11, 13, 15, 35, 50, 55
Bin 2:72, 90
Bin 3: 204
Bin 4: 215
(c) clustering:
Using Kmeans clustering to partition the data into three bins we get
Bin 1: 5, 10, 11, 13, 15, 35
Bin 2: 50, 55, 72, 90
Bin 3: 204, 215