Pandas

View code below questions

  1. Two primary data structures in pandas are Series and DataFrame. Series is one-dimensional while DataFrame is two-dimensional.
  2. Use the read_csv() function to read a CSV file into a pandas DataFrame.
  3. Use bracket notation with column name as the key to select a single column from a DataFrame.
  4. Use boolean indexing to filter rows in a pandas DataFrame based on a condition.
  5. Use groupby() function to group rows in a pandas DataFrame by a particular column.
  6. Use functions like sum() and mean() to aggregate data in a pandas DataFrame.
  7. Use fillna() function to handle missing values in a pandas DataFrame.
  8. Use merge() function to merge two pandas DataFrames together.
  9. Use to_csv() function to export a pandas DataFrame to a CSV file.
  10. Series is a one-dimensional labeled array while DataFrame is a two-dimensional table with labeled rows and columns.
import pandas as pd
import matplotlib.pyplot as plt

# read data into a pandas DataFrame
df = pd.read_csv('/home/ryanm/vscode/csp/_notebooks/files/example.csv')

# extract x and y data from DataFrame
x = df['x_column'].values
y = df['y_column'].values

# plot data using matplotlib
plt.plot(x, y)
plt.xlabel('Time')
plt.ylabel('Shits')
plt.title('Amount of shits Ryan has given')

# show the graph
plt.show()

Data Analysis / Predictive Analysis

  1. Numpy and Pandas can be used to preprocess data for predictive analysis by performing tasks such as data cleaning, feature engineering, and data normalization.
    1. Machine learning algorithms for predictive analysis include regression, classification, clustering, and deep learning. They differ in their input, output, and performance on different types of data.
    2. Real-world applications of predictive analysis include fraud detection in finance, predictive maintenance in manufacturing, demand forecasting in retail, and personalized medicine in healthcare.
    3. Feature engineering involves selecting, transforming, and creating features to improve the performance of a machine learning model. It can improve model accuracy by reducing noise and increasing signal.
    4. Machine learning models can be deployed in real-time applications for predictive analysis using tools such as APIs, microservices, and serverless functions.
    5. Limitations of Numpy and Pandas include memory usage for large datasets and limited support for distributed computing. Other tools such as Apache Spark and Dask can be used for big data analysis.
    6. Predictive analysis can be used to improve decision-making and optimize business processes by providing insights and predictions based on data. This can lead to cost savings, improved customer satisfaction, and increased revenue.

Numpy

from skimage import io
import matplotlib.pyplot as plt

photo = io.imread('../images/waldo.jpg')
type(photo)

plt.imshow(photo)
<matplotlib.image.AxesImage at 0x7f980ac869d0>
plt.imshow(photo[210:350, 425:500])
<matplotlib.image.AxesImage at 0x7f980ac6b940>