When working with large datasets in Python, we often need to combine multiple DataFrames. Pandas provides powerful functions for merging, joining, and concatenating datasets efficiently. In this tutorial, we will explore these techniques with practical examples.
Concatenation in Pandas is used to stack DataFrames either vertically (along rows) or horizontally (along columns).
import pandas as pd
data1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Amit', 'Priya']})
data2 = pd.DataFrame({'ID': [3, 4], 'Name': ['Rahul', 'Sneha']})
result = pd.concat([data1, data2], ignore_index=True)
print(result)data3 = pd.DataFrame({'Age': [25, 30]})
merged_data = pd.concat([data1, data3], axis=1)
print(merged_data)Merging is similar to SQL joins and allows combining DataFrames based on common columns.
dataA = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Amit', 'Priya', 'Rahul']})
dataB = pd.DataFrame({'ID': [2, 3, 4], 'Salary': [50000, 60000, 70000]})
merged_data = pd.merge(dataA, dataB, on='ID')
print(merged_data)left_join = pd.merge(dataA, dataB, on='ID', how='left')
print(left_join)right_join = pd.merge(dataA, dataB, on='ID', how='right')
print(right_join)outer_join = pd.merge(dataA, dataB, on='ID', how='outer')
print(outer_join)Pandas join() is used when combining DataFrames based on index.
df1 = pd.DataFrame({'Salary': [50000, 60000]}, index=['Amit', 'Priya'])
df2 = pd.DataFrame({'Age': [25, 30]}, index=['Amit', 'Priya'])
joined_df = df1.join(df2)
print(joined_df)Sign in to join the discussion and post comments.
Sign in