Image by Pexels from Pixabay

Creating data frames using pandas library

Deepak Nair
2 min readMay 18, 2020

--

Data frame is a collection of series.

Creating a data frame by combining series

#Import pandas library
import pandas as pd
#Creating first series
price_series_1 = pd.Series(data = [10, 15, 30, 9, 20, 10], index=['apple', 'banana', 'orange', 'grape', 'mango', 'lime'])
#Creating second series
sales_series_2 = pd.Series(data = [5, 10, 20, 90, 25], index=['apple', 'banana', 'orange', 'grape', 'mango'])
#axis = 1 will create multiple columns
#sort = True will sort by row index names
fruit_data_frame_1 = pd.concat([price_series_1, sales_series_2], axis=1, sort=False)
print(fruit_data_frame_1)
0 1
apple 10 5.0
banana 15 10.0
orange 30 20.0
grape 9 90.0
mango 20 25.0
lime 10 NaN
#axis=0 will stack the series on top
fruit_data_frame_1 = pd.concat([price_series_1, sales_series_2], axis=0, sort=False)
print(fruit_data_frame_1)
apple 10
banana 15
orange 30
grape 9
mango 20
lime 10
apple 5
banana 10
orange 20
grape 90
mango 25
dtype: int64

Creating from a NumPy matrix

import pandas as pd
import numpy as np
#Creating from a random 3X3 matrix
data_frame_1 = pd.DataFrame(np.random.randn(3, 3))
print(data_frame_1)
0 1 2
0 0.060806 -0.806543 0.859473
1 -0.136783 -0.711317 -0.101713
2 -0.259168 0.362533 0.080274
#Assigning row and column names
data_frame_2=pd.DataFrame(np.random.randn(3, 3), index=['row1', 'row2', 'row3'], columns=['col1', 'col2', 'col3'])
print(data_frame_2)
col1 col2 col3
row1 -0.283342 -0.307254 0.246655
row2 -1.157004 0.376876 0.355617
row3 0.512803 0.657995 0.413148

Creating from a python dictionary and writing to csv

import pandas as pd#Python dictionary
mydict = {'price': [100, 200, 300, 400],
'sales': [10, 20, 30, 40]}
labels = ['apple', 'banana', 'orange', 'grape']
data_frame_3 = pd.DataFrame(mydict, index=labels)
print(data_frame_3)
price sales
apple 100 10
banana 200 20
orange 300 30
grape 400 40
#Writing to a csv
data_frame_3.to_csv('fruit_sales.csv')

Creating from a CSV file

My CSV file contains below data

import pandas as pd#Loading from csv
data_frame_5 = pd.read_csv('fruit_sales.csv', index_col=0)
#index_col = 0 says that the first column 'A' is index value.
print(data_frame_5)
price sales
apple 100 10
banana 200 20
orange 300 30
grape 400 40

--

--