NbShare
  • Nbshare Notebooks

  • Table of Contents

  • Python Utilities

    • How To Install Jupyter Notebook
    • How to Upgrade Python Pip
    • How To Use Python Pip
  • Python

    • Python Datetime
    • Python Dictionary
    • Python Generators
    • Python Iterators and Generators
    • Python Lambda
    • Python Sort List
    • String And Literal In Python 3
    • Strftime and Strptime In Python
    • Python Tkinter
    • Python Underscore
    • Python Yield
  • Pandas

    • Aggregating and Grouping
    • DataFrame to CSV
    • DF to Numpy Array
    • Drop Columns of DF
    • Handle Json Data
    • Iterate Over Rows of DataFrame
    • Merge and Join DataFrame
    • Pivot Tables
    • Python List to DataFrame
    • Rename Columns of DataFrame
    • Select Rows and Columns Using iloc, loc and ix
    • Sort DataFrame
  • PySpark

    • Data Analysis With Pyspark
    • Read CSV
    • RDD Basics
  • Data Science

    • Confusion Matrix
    • Decision Tree Regression
    • Logistic Regression
    • Regularization Techniques
    • SVM Sklearn
    • Time Series Analysis Using ARIMA
  • Machine Learning

    • How To Code RNN and LSTM Neural Networks in Python
    • PyTorch Beginner Tutorial Tensors
    • Rectified Linear Unit For Artificial Neural Networks Part 1 Regression
    • Stock Sentiment Analysis Using Autoencoders
  • Natural Language
    Processing

    • Opinion Mining Aspect Level Sentiment Analysis
    • Sentiment Analysis using Autoencoders
    • Understanding Autoencoders With Examples
    • Word Embeddings Transformers In SVM Classifier
  • R

    • DataFrame to CSV
    • How to Create DataFrame in R
    • How To Use Grep In R
    • How To Use R Dplyr Package
    • Introduction To R DataFrames
    • Tidy Data In R
  • A.I. News
NbShare Notebooks
  • Publish Your Post On nbshare.io

  • R Python Pandas Data Science Excel NLP Numpy Pyspark Finance

Pandas Read and Write Excel File

Make sure you have openpyxl package installed. Otherwise you will get following error
...

ModuleNotFoundError: No module named 'openpyxl'

Install the package with following command
... pip install openpyxl

Pandas print excel sheet names

In [1]:
import pandas as pd

Pandas has ExcelFile method which returns Pandas excel object.

In [2]:
excel = pd.ExcelFile("stocks.xlsx")
excel.sheet_names
Out[2]:
['Sheet12']

Note you might run in to following error

ValueError: Worksheet index 0 is invalid, 0 worksheets found

which usually means the Excel file is corrupt. To fix this error, copy the data in to another excel file and save it.

ExcelFile has many methods. For example excel.dict will print the data of spreadsheet in dictionary format.

In [3]:
excel.__dict__
Out[3]:
{'io': 'stocks.xlsx',
 '_io': 'stocks.xlsx',
 'engine': 'openpyxl',
 'storage_options': None,
 '_reader': <pandas.io.excel._openpyxl.OpenpyxlReader at 0x7f4cb232c8e0>}

To convert the data in to Pandas Dataframe. We will use ExcelFile.parse() method.

Pandas Read Excel Files

In [4]:
excel = pd.ExcelFile("stocks.xlsx")
df = excel.parse()
In [5]:
df.head()
Out[5]:
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3
0 NaN Stock Price Date
1 NaN INTC 28.9 2022-11-29 00:00:00
2 NaN AAPL 141.17 2022-11-29 00:00:00

Since our excel sheet has first column and row empty that is why we see headers and ist column as Unnamed and NaN respectively.

Let us fix it by specifying that header starts at row1.

In [6]:
excel.parse(header=1)
Out[6]:
Unnamed: 0 Stock Price Date
0 NaN INTC 28.90 2022-11-29
1 NaN AAPL 141.17 2022-11-29

To fix the column indexing, we can use "usecols" option as shown below.

In [7]:
excel.parse(usecols=[1,2,3],header=1)
Out[7]:
Stock Price Date
0 INTC 28.90 2022-11-29
1 AAPL 141.17 2022-11-29

To specify stock symbol as our index column, we can ues "index_col" option.

In [8]:
excel.parse(index_col="Stock",usecols=[1,2,3],header=1)
Out[8]:
Price Date
Stock
INTC 28.90 2022-11-29
AAPL 141.17 2022-11-29

We can also use pd.read_excel() method to achieve the same

In [9]:
pd.read_excel("stocks.xlsx",index_col="Stock",usecols=[1,2,3],header=1)
Out[9]:
Price Date
Stock
INTC 28.90 2022-11-29
AAPL 141.17 2022-11-29

Instead of specifying each column number, we can use range function to specify the columns which contain the data.

In [10]:
excel.parse(usecols=range(1,4),header=1)
Out[10]:
Stock Price Date
0 INTC 28.90 2022-11-29
1 AAPL 141.17 2022-11-29

let us save the dataframe in to a variable.

In [11]:
dfef = pd.read_excel("stocks.xlsx",usecols=range(1,4),header=1)
In [12]:
dfef.head()
Out[12]:
Stock Price Date
0 INTC 28.90 2022-11-29
1 AAPL 141.17 2022-11-29

Pandas write Dataframe to Excel File

We can write the dataframe in to Excel file using pd.to_excel() method.

In [13]:
dfef.to_excel("stocktmp.xlsx")
In [14]:
!ls -lrt stocktmp.xlsx
-rw-r--r-- 1 root root 5078 Nov 30 05:21 stocktmp.xlsx

Related Notebooks

  • How To Fix Error Pandas Cannot Open An Excel xlsx File
  • Write Single And Multi Line Comments In Python
  • How To Read CSV File Using Python PySpark
  • How To Read JSON Data Using Python Pandas
  • How To Write DataFrame To CSV In R
  • Select Pandas Dataframe Rows And Columns Using iloc loc and ix
  • Pandas How To Sort Columns And Rows
  • Summarising Aggregating and Grouping data in Python Pandas
  • Merge and Join DataFrames with Pandas in Python

Register

User Already registered.


Login

Login

We didn't find you! Please Register

Wrong Password!


Register
    Top Notebooks:
  • Data Analysis With Pyspark Dataframe
  • Strftime and Strptime In Python
  • Python If Not
  • Python Is Integer
  • Dictionaries in Python
  • How To install Python3.9 With Conda
  • String And Literal In Python 3
  • Privacy Policy
©