Knn algorithm using iris dataset

Monalisha · July 1, 2022, 4:21pm

so i am doing a code in jupyter notebook on knn algorithm in which one step has error, n i dont know how to fix this… so can anyone help? so here is the code:

Calculating errors

train_error= []
test_error= []

for k in range(1, 31):
knn= KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train_scaled, y_train)

y_pred1= knn.predict(X_train_scaled)
train_error.append(np.mean(y_train!=y_pred1))

y_pred2= knn.predict(X_test_scaled)
test_error.append(np.mean(y_test!=y_pred2))

Plotting error curve

plt.figure(figsize=(10, 5))
plt.plot(range(1, 15), train_error, color=‘b’, label=“Train”)
plt.plot(range(1, 15), test_error, color=‘r’, label=“Test”)
plt.xlabel(‘Number of nearest neighbors (k)’, fontsize=14)
plt.ylabel(‘Error’, fontsize=14)
plt.title(‘Finding optimal value of K using error curves’, fontsize=18, pad=15)
plt.legend()
plt.show()
n here is the error:
ValueError Traceback (most recent call last)
Input In [8], in <cell line: 17>()
15 # Plotting error curve
16 plt.figure(figsize=(10, 5))
—> 17 plt.plot(range(1, 15), train_error, color=‘b’, label=“Train”)
18 plt.plot(range(1, 15), test_error, color=‘r’, label=“Test”)
19 plt.xlabel(‘Number of nearest neighbors (k)’, fontsize=14)

File ~\anaconda3\lib\site-packages\matplotlib\pyplot.py:2757, in plot(scalex, scaley, data, *args, **kwargs)
2755 @_copy_docstring_and_deprecators(Axes.plot)
2756 def plot(*args, scalex=True, scaley=True, data=None, **kwargs):
→ 2757 return gca().plot(
2758 *args, scalex=scalex, scaley=scaley,
2759 **({“data”: data} if data is not None else {}), **kwargs)

File ~\anaconda3\lib\site-packages\matplotlib\axes_axes.py:1632, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
1390 “”"
1391 Plot y versus x as lines and/or markers.
1392
(…)
1629 ('green') or hex strings ('#008000').
1630 “”"
1631 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
→ 1632 lines = [*self._get_lines(*args, data=data, **kwargs)]
1633 for line in lines:
1634 self.add_line(line)

File ~\anaconda3\lib\site-packages\matplotlib\axes_base.py:312, in _process_plot_var_args.call(self, data, *args, **kwargs)
310 this += args[0],
311 args = args[1:]
→ 312 yield from self._plot_args(this, kwargs)

File ~\anaconda3\lib\site-packages\matplotlib\axes_base.py:498, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs)
495 self.axes.yaxis.update_units(y)
497 if x.shape[0] != y.shape[0]:
→ 498 raise ValueError(f"x and y must have same first dimension, but "
499 f"have shapes {x.shape} and {y.shape}“)
500 if x.ndim > 2 or y.ndim > 2:
501 raise ValueError(f"x and y can be no greater than 2D, but have "
502 f"shapes {x.shape} and {y.shape}”)

ValueError: x and y must have same first dimension, but have shapes (14,) and (30,)

ptrblck · July 2, 2022, 1:38am

The error is raised in plt.plot(range(1, 15), train_error, color='b', label="Train") as the x value (first argument passed as range(1, 15)) has a shape of (14,) while the y argument (second argument passed as train_error) has a shape of (30,).
You could try to use range(len(train_error)) or remove the first argument.

Note that your use case doesn’t seem to be PyTorch-related, so you might get a faster response in other discussion boards.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier.