Python Read Last Line of Csv File

Most of the data is available in a tabular format of CSV files. It is very popular. You can convert them to a pandas DataFrame using the read_csv function. The pandas.read_csv is used to load a CSV file as a pandas dataframe.

In this article, y'all will learn the different features of the read_csv office of pandas apart from loading the CSV file and the parameters which tin can exist customized to become improve output from the read_csv function.

pandas.read_csv

Syntax: pandas.read_csv( filepath_or_buffer, sep, header, index_col, usecols, prefix, dtype, converters, skiprows, skiprows, nrows, na_values, parse_dates)Purpose: Read a comma-separated values (csv) file into DataFrame. Also supports optionally iterating or breaking the file into chunks.
Parameters:
- filepath_or_buffer : str, path object or file-like object Any valid cord path is acceptable. The string could be a URL too. Path object refers to os.PathLike. File-like objects with a read() method, such every bit a filehandle (e.g. via built-in open function) or StringIO.
- sep : str, (Default ',') Separating boundary which distinguishes between any two subsequent information items.
- header : int, list of int, (Default 'infer') Row number(s) to use as the column names, and the showtime of the information. The default behavior is to infer the column names: if no names are passed the beliefs is identical to header=0 and cavalcade names are inferred from the first line of the file.
- names : assortment-like List of column names to utilize. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this listing are non immune.
- index_col : int, str, sequence of int/str, or Imitation, (Default None) Cavalcade(s) to use as the row labels of the DataFrame, either given equally cord proper noun or column index. If a sequence of int/str is given, a MultiIndex is used.
- usecols : listing-similar or callable Return a subset of the columns. If callable, the callable office will be evaluated against the column names, returning names where the callable function evaluates to True.
- prefix : str Prefix to add to column numbers when no header, e.1000. 'X' for X0, X1
- dtype : Type name or dict of column -> type Data type for information or columns. Eastward.grand. {'a': np.float64, 'b': np.int32, 'c': 'Int64'} Apply str or object together with suitable na_values settings to preserve and not translate dtype.
- converters : dict Dict of functions for converting values in certain columns. Keys can either exist integers or column labels.
- skiprows : list-similar, int or callable Line numbers to skip (0-indexed) or the number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise.
- skipfooter : int Number of lines at bottom of the file to skip
- nrows : int Number of rows of file to read. Useful for reading pieces of large files.
- na_values : scalar, str, list-similar, or dict Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#North/A', '#N/A N/A', '#NA', '-1.#IND', '-ane.#QNAN', '-NaN', '-nan', 'i.#IND', '1.#QNAN', '', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.
- parse_dates : bool or listing of int or names or listing of lists or dict, (default Simulated) If fix to True, volition attempt to parse the alphabetize, else parse the columns passed

Returns: DataFrame or TextParser, A comma-separated values (CSV) file is returned as a two-dimensional data structure with labeled axes. _For total listing of parameters, refer to the offical documentation

Reading CSV file

The pandas read_csv office tin can exist used in dissimilar ways every bit per necessity similar using custom separators, reading just selective columns/rows and and then on. All cases are covered below one after another.

Default Separator

To read a CSV file, phone call the pandas role read_csv() and pass the file path equally input.

Step ane: Import Pandas

                      import            pandas            as            pd

Pace 2: Read the CSV

                      # Read the csv file            df            = pd.read_csv("data1.csv")            # First 5 rows            df.head()

Dissimilar, Custom Separators

By default, a CSV is seperated past comma. Simply yous tin can use other seperators as well. The pandas.read_csvoffice is not limited to reading the CSV file with default separator (i.east. comma). It tin can exist used for other separators such equally ;, | or :. To load CSV files with such separators, the sep parameter is used to laissez passer the separator used in the CSV file.

Let's load a file with | separator

          #            Read            the csv            file            sep='|'            df = pd.read_csv("data2.csv", sep='|') df

Custom Separators for read _csv pandas file

Gear up whatsoever row as column header

Let'southward come across the information frame created using the read_csv pandas part without whatever header parameter:

                      # Read the csv file            df            = pd.read_csv("data1.csv") df.head()

The row 0 seems to be a amend fit for the header. It tin explicate better nigh the figures in the table. You can make this 0 row as a header while reading the CSV past using the header parameter. Header parameter takes the value as a row number.

Annotation: Row numbering starts from 0 including column header

                      # Read the csv file with header parameter            df            = pd.read_csv("data1.csv",            header=1)            df.head()

Renaming column headers

While reading the CSV file, you can rename the column headers by using the names parameter. The names parameter takes the list of names of the column header.

          # Read the csv            file            with names            parameter            df            = pd.read_csv(            "data.csv"            , names=[            'Ranking'            ,            'ST Name'            ,            'Pop'            ,            'NS'            ,            'D'            ])            df.head()

Renaming Column header for read _csv pandas file

To avoid the former header existence inferred every bit a row for the information frame, y'all tin can provide the header parameter which will override the old header names with new names.

          # Read the csv            file            with header            and            names            parameter            df            = pd.read_csv(            "data.csv"            , header=0, names=[            'Ranking'            ,            'ST Name'            ,            'Pop'            ,            'NS'            ,            'D'            ])            df.head()

Loading CSV without column headers in pandas

There is a take chances that the CSV file you lot load doesn't accept any column header. The pandas will make the first row as a cavalcade header in the default case.

                      # Read the csv file            df            = pd.read_csv("data3.csv") df.head()

To avoid any row being inferred every bit column header, you can specify header as None. It will force pandas to create numbered columns starting from 0.

                      # Read the csv file with header=None            df            = pd.read_csv("data3.csv",            header=None)            df.head()

Adding Prefixes to numbered columns

Yous can besides requite prefixes to the numbered column headers using the prefix parameter of pandas read_csv office.

                      # Read the csv file with header=None and prefix=column_            df            = pd.read_csv("data3.csv",            header=None,            prefix='column_')            df.head()

Set any cavalcade(southward) as Index

By default, Pandas adds an initial alphabetize to the data frame loaded from the CSV file. Y'all tin can control this beliefs and make any column of your CSV as an alphabetize by using the index_col parameter.

It takes the name of the desired column which has to be made equally an index.

Instance 1: Making one column equally index

          # Read the csv file            with            'Rank'            as            alphabetize df = pd.read_csv("data.csv", index_col='Rank') df.head()

Case 2: Making multiple columns as alphabetize

For ii or more columns to exist fabricated as an index, pass them every bit a list.

          # Read the csv            file            with            'Rank'            and            'Date'            as            index            df = pd.read_csv("information.csv", index_col=['Rank',            'Date']) df.head()

Selecting columns while reading CSV

In do, all the columns of the CSV file are non important. Y'all tin select only the necessary columns later on loading the file but if yous're aware of those beforehand, you lot can save the space and time.

usecols parameter takes the list of columns you desire to load in your data frame.

Selecting columns using list

          #            Read            the csv file            with            'Rank',            'Engagement'            and            'Population'            columns (list) df = pd.read_csv("information.csv", usecols=['Rank',            'Appointment',            'Population']) df.head()

Selecting column for read_csv pandas file

Selecting columns using callable functions

usecols parameter can also take callable functions. The callable functions evaluate on column names to select that specific cavalcade where the part evaluates to True.

          # Read the csv file            with            columns            where            length            of            column proper noun >            10            df = pd.read_csv("data.csv", usecols=lambda x: len(x)>10) df.caput()

Selecting/skipping rows while reading CSV

You tin skip or select a specific number of rows from the dataset using the pandas.read_csv office. There are three parameters that can practice this task: nrows, skiprows and skipfooter.

All of them have unlike functions. Let's discuss each of them separately.

A. nrows : This parameter allows you to control how many rows you desire to load from the CSV file. It takes an integer specifying row count.

                      # Read the csv file with 5 rows            df            = pd.read_csv("data.csv",            nrows=v)            df

B. skiprows : This parameter allows y'all to skip rows from the showtime of the file.

Skiprows past specifying row indices

                      # Read the csv file with showtime row skipped            df            = pd.read_csv("information.csv",            skiprows=1)            df.caput()

Skiprows by using callback office

skiprows parameter can as well take a callable function as input which evaluates on row indices. This means the callable part will cheque for every row indices to decide if that row should exist skipped or not.

                      # Read the csv file with odd rows skipped            df            = pd.read_csv("data.csv",            skiprows=lambda            x: ten%2!=0) df.head()

C. skipfooter : This parameter allows you lot to skip rows from the terminate of the file.

                      # Read the csv file with 1 row skipped from the stop            df            = pd.read_csv("information.csv",            skipfooter=ane)            df.tail()

Changing the data type of columns

You lot tin specify the data types of columns while reading the CSV file. dtype parameter takes in the dictionary of columns with their data types defined. To assign the information types, you can import them from the numpy package and mention them against suitable columns.

Information Type of Rank before change

                      # Read the csv file                        df            = pd.read_csv("data.csv")            # Brandish datatype of Rank            df.Rank.dtypes

                                    dtype              ('int64')

Data Type of Rank later change

          #            import            numpy            import            numpy            as            np  #            Read            the csv file with data            type            specified for            Rank.            df            = pd.read_csv("data.csv", dtype={'Rank':np.int8})  #            Brandish            datablazon            of rank            df.Rank.dtypes

                                    dtype              ('int8')

Parse Dates while reading CSV

Appointment time values are very crucial for information analysis. You can convert a column to a datetime type column while reading the CSV in 2 means:

Method 1. Make the desired column as an alphabetize and pass parse_dates=Truthful

          # Read the csv file            with            'Engagement'            as            index and parse_dates=True            df = pd.read_csv("data.csv", index_col='Date', parse_dates=True, nrows=five)  # Display alphabetize df.alphabetize

          DatetimeIndex(['2021            -02            -25', '2021            -04            -14', '2021            -02            -19', '2021            -02            -24',                '2021            -02            -13'],               dtype='datetime64[ns]', name='Date', freq=None)

Method 2. Pass desired column in parse_dates as list

          # Read the csv file            with            parse_dates=['Appointment'] df = pd.read_csv("data.csv", parse_dates=['Engagement'], nrows=five)  # Display datatypes            of            columns df.dtypes

                      Rank            int64            State                          object                        Population                          object                        National            Share            (%)                          object                        Date            datetime64[ns] dtype:                          object

Adding more than NaN values

Pandas library tin can handle a lot of missing values. But at that place are many cases where the data contains missing values in forms that are not nowadays in the pandas NA values list. Information technology doesn't understand 'missing', 'not found', or 'not available' every bit missing values.

So, y'all demand to assign them equally missing. To do this, use the na_values parameter that takes a list of such values.

Loading CSV without specifying na_values

                      # Read the csv file            df            = pd.read_csv("data.csv",            nrows=v)            df

Loading CSV with specifying na_values

          # Read the csv file            with            'missing'            as            na_values df = pd.read_csv("data.csv", na_values=['missing'], nrows=5) df

Convert values of the column while reading CSV

You tin can transform, modify, or catechumen the values of the columns of the CSV file while loading the CSV itself. This can be washed by using the converters parameter. converters takes in a dictionary with keys equally the column names and values are the functions to be applied to them.

Let's catechumen the comma seperated values (i.e nineteen,98,12,341) of the Population cavalcade in the dataset to integer value (199812341) while reading the CSV.

                      # Function which converts comma seperated value to integer            toInt = lambda 10:            int(10.supplant(',',            ''))            if            x!='missing'            else            -1            # Read the csv file                        df = pd.read_csv("information.csv", converters={'Population': toInt}) df.caput()

Practical Tips

Before loading the CSV file into a pandas data frame, always take a skimmed look at the file. It will assist you estimate which columns yous should import and determine what data types your columns should take.
You should also lookout man for the total row count of the dataset. A system with 4 GB RAM may not exist able to load 7-8M rows.

Examination your knowledge

Q1: Yous cannot load files with the $ separator using the pandas read_csv function. Truthful or False?

Answer:

Answer: False. Because, you lot can use sep parameter in read_csv function.

Q2: What is the employ of the converters parameter in the read_csv function?

Reply:

Reply: converters parameter is used to change the values of the columns while loading the CSV.

Q3: How will you make pandas recognize that a particular cavalcade is datetime type?

Respond:

Respond: By using parse_dates parameter.

Q4: A dataset contains missing values no, non available, and '-100'. How volition you specify them as missing values for Pandas to correctly interpret them? (Assume CSV file name: example1.csv)

Answer:

Answer: By using na_values parameter.

                          import              pandas              as              pd  df = pd.read_csv("example1.csv", na_values=['no',              'not bachelor',              '-100'])

Q5: How would y'all read a CSV file where,

The heading of the columns is in the 3rd row (numbered from i).
The terminal 5 lines of the file take garbage text and should be avoided.
Only the cavalcade names whose first letter starts with vowels should be included. Presume they are one discussion only.

(CSV file name: example2.csv)

Respond:

Answer:

                          import              pandas              as              pd  colnameWithVowels = lambda              x:              x.lower()[0]              in              ['a',              'e',              'i',              'o',              'u']  df = pd.read_csv("example2.csv", usecols=colnameWithVowels, header=two, skipfooter=5)

The commodity was contributed by Kaustubh G and Shrivarsheni

huffinetallay.blogspot.com

Source: https://www.machinelearningplus.com/pandas/pandas-read_csv-completed/