Data Analysis

Go to Problems

Numpy

 In Python, we are provided lists whose functionalities are almost similar to arrays (different in some aspects also like they are slower)..

Numpy array is almost 50 times faster than the python lists.

Numpy is faster because it is implemented in c and its objects consist of similar data types, and its tasks are divided and performed parallelly.

The array object in NumPy is called ndarray, and the functions provided in the library for this object make it quite easy to perform various operations more efficiently than python lists.

Arrays can be considered a primitive data structure for storing information and it comes in very handy when complex operations are required to perform on a large number of elements.

For using numpy, create an alias with the as a keyword while importing:

  • We can use array() function to create an ndarray object.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
  • For creating a 2d array we can follow the following code:
arr = np.array([[1, 2, 3], [4, 5, 6]]) 
  • And 3d with the following one: 
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
  • And to create an array using ndim as in the following code:
arr = np.array([1, 2, 3, 4], ndmin=5) #where ndmin will be defining the number of dimensions.

 

  • We can use a function numpy.asarray() which takes a list,list of tuples,tuples, other python sequences as input and converts them into ndarray. For eg.
x = [1,2,3] 

a = np.asarray(x)

 

  • np.arange() function returns an ndarray object containing evenly spaced values within a given range.
numpy.arange(start, stop, step, dtype)

a=np.arange(1,10,2)  # [1 3 5 7 9]

 

Few of the Operations on NumPy arrays are illustrated below

One can easily perform various complex mathematical operations on the numpy objects through its inbuilt functions because of vectorization. Some of them are:

1. Power 

arr = np.array([1,2,3,4,5]) 
# for finding an array raised to some power
print(np.power(arr, 3))  #  [  1   8  27  64 125]
arr1 = np.array([1,2,3,4,5]) 
print(np.power(arr, arr1)) #  [   1    4   27  256 3125]

 

2. Addition/Subtraction 

# for adding subtracting
print(np.add(arr,[5,4,3,2,1])) #  [6 6 6 6 6]
print(np.subtract(arr,[5,4,3,2,1]))  #  [-4 -2  0  2  4]

 

3. Multiplication 

#for multiplying
print(3*a)  #  [ 3  6  9 12 15]

 

4. Dot product 

#dot product
print(np.dot(arr,arr1))  # 55
print(np.dot([[1,2],[3,4]],[5,6]))  #  [17 39]

 

5. Slicing

#slicing
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(arr[:, 1:4])  
#[[2 3 4]
 [7 8 9]]

 

6. Cross product

#cross product
print(np.cross([1,2,3],[4,5,6]))  # [-3  6 -3]

 

Broadcasting:

It refers to the manner how Numpy manages arrays with different dimensions during arithmetic operations. If certain constraints are followed then the array with smaller dimensions is ‘broadcasted’ across the array with larger dimensions, For eg. As shown in the below example when a numpy array of shape (5,) is added to an array of shape (6,1) the resultant we got is of shape of (6,5), the array of shape(5,) is broadcasted to shape (1,5)

import numpy as np

x = np.arange(6)

x2 = x.reshape(6,1)

y = np.ones(5)

y2 = np.ones((3,4))

print(x+2)# simplest broadcasting when a scalar is added to a vector

# [2 3 4 5 6 7]

print(y.shape)

# (5,)

print(x2 + y)

#  [[1. 1. 1. 1. 1.]

 [2. 2. 2. 2. 2.]

 [3. 3. 3. 3. 3.]

 [4. 4. 4. 4. 4.]

 [5. 5. 5. 5. 5.]

 [6. 6. 6. 6. 6.]]

#print(x2 + y2) -> it will give error as the constraints for broadcasting are not followed.

 

Using NumPy, a developer can perform the following operations −

  • Mathematical and logical operations on arrays.
  • Used for implementing multi-dimensional arrays and matrices. 
  • Library of high-level mathematical functions to operate on these matrices and arrays.
  • Fourier transforms and routines for shape manipulation.
  • Operations related to linear algebra. NumPy has in-built functions for linear algebra and random number generation.
  • Designed for scientific computation

Serious about Learning Data Science and Machine Learning ?

Learn this and a lot more with Scaler's Data Science industry vetted curriculum.
Vector analysis (numpy)
Problem Score Companies Time Status
find the one 30
2:29
choose the output 30
4:00
python broadcasting 30
4:40
How not to retrieve? 30
4:54
Fill Infinite 30
2:36
Duplicates detection 50
25:10
Row-wise unique 50
29:15
Data handling (pandas)
Problem Score Companies Time Status
For 'series' 30
4:44
drop axis 30
1:47
Rename axis 30
2:17
iloc vs loc part I 30
1:42
As a Series 50
19:57
Max registrations they asked? 50
43:15
Basic computer vision (opencv)
Problem Score Companies Time Status
Which library it is? 30
0:50
Image dimensions 30
1:34
Dimension with components 30
1:18
Color interpretation 30
1:55
Image cropping 30
2:02
Data visualization (matplotlib)
Problem Score Companies Time Status
2d graphics 30
0:39
Suitable plot type 30
1:20
Subplot Coordinates 30
3:56
Vertically Stacked Bar Graph 30
3:32
Load RGB 30
2:25
Web scraping basics
Problem Score Companies Time Status
What does the code do? 30
2:35
Retrieval protocol 30
1:36
2-way communication 30
0:54
Search engine process 30
1:31
What does the code print? 30
1:17
Eda
Problem Score Companies Time Status
PCA's secondary objective 30
1:33
Five number theory 30
1:32