Pandas tutorial: All you need to know about pandas dataframe | Sole Concept Lab
Tutorials

Pandas tutorial: All you need to know about pandas dataframe

Pandas is a widely used package in python which provides a wide range of fast and expressive data structures. The most commonly used structure is a pandas dataframe. This is designed to enable the user to work effectively with different data types. In this brief pandas tutorial, I will quickly walk you through some of the most commonly used syntaxes when it comes to dataframe’s in python.

However, before we begin let me remind you that pandas comes in with a lot of functions and these can be found in their documentation.Let us look at the order in which we will be proceeding.

  • Create dataframe
  • Read files into dataframe
  • Get basic information about the data
  • Perform some of the most commonly used excel methods with pandas dataframe

1. Creating a pandas dataframe

Pandas dataframes can be created in multiple ways we will quickly see the two most common ways in which pandas dataframes can be created.

  • Using a lists
  • Using Dictionaries.

Let me show you the snippet of the code before I explain them to you.

creating a pandas dataframe

 

Like any other module, we begin by importing pandas as pd. I then created two lists a which contains only numbers and b which contains only strings. To create a dataframe all you have to do is call the DataFrame method from pandas and in brackets pass in the two lists as shown in the image above. I have stored the dataframe in the variable df. Lets quickly take a look at the output

We can very clearly see that the data frame has three columns and two rows. Apart from this we also notice that the columns do not have names but have index 0,1,2 associated with them. Also we see that the dataframe has an index associated with each row and they are also numbers. These indices can be accesed using the command df.index (will cover it in the next post).

Another commonly used way to create a data frame is to use dictionaries. It is very similar to the above procedure but with one slight difference. Let’s take a look at that aspect later before that let us create a dataframe using a dictionary. I am going to take off from the previously shown code an create a dictionary which contains two keys named ‘a’ and ‘b’ and pass in the list we created previously as values. Let’s have a look at this.

creating a pandas dataframe with dictionary

Creating a dataframe is very similar to the steps described above. All you have to do is to call the DataFrame method from pandas and pass in the dictionary c. Have a look at the output above. What do you see as a difference? Can you point it out …….

I think you figured it out. The difference is that we have two labeled columns a and b having 3 rows each. The index still remains the same. You have just learned how to create a pandas dataframe. It’s now your time to explore and try using various data types. Get your hands dirty with it, there is no better way to learn. Let us quickly move on to other types of creating and using dataframes as we discussed in the beginning.

2. Creating pandas dataframe by reading in a file.

This is somewhat easier. All you have to do to is read in a file by using one of the various methods that pandas gives us. Here we will see how we can read two types of files one a csv and the other an excel file. It is very straightforward to read the csv file into a pandas dataframe. Lets get started. to read a csv file all we have to do is call in the read_csv method from pandas. Take a look at the command in the below figure

Reading data into a pandas dataframe

Here you can see that to read a data into a dataframe all you have to do is pass the location of the data as the argument to the method. The second line of the code is the head method which displays the first 5 lines of the data frame.

Moving to the second scenario where we will be reading in data from a excel file. Here the excel file also consists of multiple sheets. In such a case we can also see the names of these sheets. Take a look at the snapshot below to see how this can be done. At this point, the image stands self-explanatory. Leave a comment from what you understand in the image.

Now you clearly see that there are multiple sheets available how do you say for instance parse the sheet having the name ‘Purchase Data – Study’ and get the data from that. That’s easy peezy !! Just call the data frame and parse the sheet !!! That’s it !

By now you would have gained a basic understanding of how pandas can be used. Let us take it a step further and see some of the common operations that can be performed once the data is read.

3. Get Basic information about the data

The first and foremost thing that we would love to know after the dataframe is set up is the number of rows and columns it has. This can be obtained by using the shape method on the dataframe. Take a look below.

Here the return is a tuple. Think of this as a matrix of shape (m,n) so the number of rows is 2891 and the number of columns is 7.  Moving ahead we would love to know the type of data in each column this can be easily obtained by using the .info method. Have a look at what it returns.

Getting basic info of a dataframe

This is absolutely perfect this returns the type of data in each column and also tells you if it has any null value or not. Take a look. OderId is a non-null column with datatype as int64 and OrderDate is also non-null but it is of the type datetime64[ns]. I am sure you can figure out the rest. Pandas is this easy, it has many more commonly used commands to help you get started but I will leave it here for now. In later posts we will cover the others in detail.

4. Commonly used excel functions on pandas dataframe

Can you think of the magic that excel can do with its tables? Can you name the method which causes the magic? You are right its the Pivot table. Pandas can pretty much handle everything that Excel does. Take a look at this blog to explore more. I will quickly show you how you can perform two operations namely groupby and pivot. Let’s start with the code

Groupby operation on a pandas dataframe

You can see here that the useful column here is the UserId, the number nothing but the number of users who joined on that particular date. Moving to another commonly used method in Excel, it is the pivot table. In pandas you can simply call the .pivot_table method to generate the pivot table. Take a look at the code below.

It is very clear here, when you pivot the data the index used is the OrderDate and the values are mean of the TotalCharges is the values of the pivot table. This enables us to monitor the mean earnings over a period of time. Graphically this can be represented as.

Summary:

So far in this post I have introduced you to pandas dataframe’s and walked y0u through some of the very basic functionalites of dataframe and some of the commonly used syntax. If you have trouble understanding this or are very new to python I recommend you to go through the python for beginner series.

Stay tuned for more…… but first get your hands dirty with pandas dataframe

Sharing is caring!

Leave a Comment