{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# pd.DataFrame\n", "\n", "`pd.DataFrame` --- по сути дела таблица, на которую можно смотреть как на объединение столбцов `pd.Series` с выравниванием по общему индексу. \n", "\n", "\n", "Можно изменять содержимое ячеек таблицы, добавление новых строк --- дорогостоящая операция (пересоздание таблицы), но добавление столбцов осуществляется на месте.\n", "\n", "## Создание таблицы\n", "\n", "Есть множество способов создать датафрейм из уже существующих объектов python. \n", "\n", "Один самых удобных --- используя словари. \n", "\n", "Если ключи в словаре соответствуют именам столбцов, а значения --- списки, `NumPy` массивы, `pandas` серии данных этих столбцов, то используется конструктор `pd.DataFrame`. При этом если в значениях присутствуют объекты `pd.Series`, то индекс созданного датафрейма будет объединением индексов серий, а данные всех столбцов выравниваются по меткам этого индекса." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3
a1.04.05
b2.03.06
cNaNNaN7
\n", "
" ], "text/plain": [ " col1 col2 col3\n", "a 1.0 4.0 5\n", "b 2.0 3.0 6\n", "c NaN NaN 7" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "col1 = pd.Series([1, 2], index=[\"a\", \"b\"])\n", "col2 = pd.Series([3, 4], index=[\"b\", \"a\"])\n", "col3 = pd.Series([5, 6, 7], index=[\"a\", \"b\", \"c\"])\n", "\n", "d = {\n", " 'col1': col1, \n", " 'col2': col2,\n", " 'col3': col3,\n", "}\n", "\n", "df = pd.DataFrame(d)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если значения в словаре списки или массивы, то они должны быть одной длинны. В созданной таблице данные в столбцах идут в том же порядке, что и в исходных списках/массивах (без выравнивания), что и в списках. Индекс созданной таблицы по умолчанию --- `RangeIndex`, но можно явно указать его при создании опциональным параметром `index`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2
a13
b24
\n", "
" ], "text/plain": [ " col1 col2\n", "a 1 3\n", "b 2 4" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col1 = np.array([1, 2])\n", "col2 = [3, 4]\n", "\n", "d = {\n", " \"col1\": col1,\n", " \"col2\": col2\n", "}\n", "\n", "df = pd.DataFrame(d, index=[\"a\", \"b\"])\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[pd.DataFrame.form_dict](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.from_dict.html#pandas.DataFrame.from_dict) --- более специализированный метод для создания датафрейма из словаря. В предыдущих двух примерах он сработал бы аналогично, но его имя длиннее. Его преимущество заключается, например, в том, что он позволяет создать датафрейм из словаря, в котором ключи --- метки строк (индекс), а значения --- строки будущей таблицы.\n", "\n", "Для этого необходимо указать в качестве параметра `orient` строку `index`. Далее все аналогично. Если строки представлены в виде `pd.Series`, то метки попадут в названия столбцов с выравниванием данных. Если это простые массивы, то имена столбцов --- `RangeIndex`, но можно явно указать имена столбцов по параметру `columns`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
row112
row243
\n", "
" ], "text/plain": [ " a b\n", "row1 1 2\n", "row2 4 3" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "row1 = pd.Series([1, 2], index=[\"a\", \"b\"])\n", "row2 = pd.Series([3, 4], index=[\"b\", \"a\"])\n", "\n", "d = {\n", " \"row1\": row1,\n", " \"row2\": row2,\n", "}\n", "\n", "pd.DataFrame.from_dict(d, orient=\"index\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если данные представлены в виде какой-то двухмерной структуры, в которой отсутствуют метки и строк и столбцов, то можно воспользоваться методом `pd.DataFrame`, при этом индекс и имена столбцов можно указать соответствующими параметрами `index` и `columns`. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2
row112
row234
row356
\n", "
" ], "text/plain": [ " col1 col2\n", "row1 1 2\n", "row2 3 4\n", "row3 5 6" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = [\n", " [1, 2],\n", " [3, 4],\n", " [5, 6]\n", "]\n", "\n", "pd.DataFrame(data, index=[\"row1\", \"row2\", \"row3\"], columns=[\"col1\", \"col2\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Хранение и чтение таблиц \n", "\n", "Для хранения таблиц чаще всего используются \n", "\n", "Гораздо чаще датафрейм получается, как результат чтение таблицы из файла. Чаще всего такие таблицы хранятся в формате [csv](https://ru.wikipedia.org/wiki/CSV) или в форматах таблиц `excel`.\n", "\n", "\n", "> Строка таблицы соответствует строке текста, которая содержит одно или несколько полей, разделенных запятыми.\n", "\n", "\n", "```\n", "brand,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb\n", "Mazda RX4,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4\n", "Mazda RX4 Wag,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4\n", "Datsun 710,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1\n", "Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1\n", "Hornet Sportabout,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2\n", "Valiant,18.1,6,225.0,105,2.76,3.46,20.22,1,0,3,1\n", "Duster 360,14.3,8,360.0,245,3.21,3.57,15.84,0,0,3,4\n", "Merc 240D,24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2\n", "Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2\n", "Merc 280,19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4\n", "Merc 280C,17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4\n", "Merc 450SE,16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3\n", "Merc 450SL,17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3\n", "Merc 450SLC,15.2,8,275.8,180,3.07,3.78,18.0,0,0,3,3\n", "Cadillac Fleetwood,10.4,8,472.0,205,2.93,5.25,17.98,0,0,3,4\n", "Lincoln Continental,10.4,8,460.0,215,3.0,5.424,17.82,0,0,3,4\n", "Chrysler Imperial,14.7,8,440.0,230,3.23,5.345,17.42,0,0,3,4\n", "Fiat 128,32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1\n", "Honda Civic,30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2\n", "Toyota Corolla,33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1\n", "Toyota Corona,21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1\n", "Dodge Challenger,15.5,8,318.0,150,2.76,3.52,16.87,0,0,3,2\n", "AMC Javelin,15.2,8,304.0,150,3.15,3.435,17.3,0,0,3,2\n", "Camaro Z28,13.3,8,350.0,245,3.73,3.84,15.41,0,0,3,4\n", "Pontiac Firebird,19.2,8,400.0,175,3.08,3.845,17.05,0,0,3,2\n", "Fiat X1-9,27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1\n", "Porsche 914-2,26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2\n", "Lotus Europa,30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2\n", "Ford Pantera L,15.8,8,351.0,264,4.22,3.17,14.5,0,1,5,4\n", "Ferrari Dino,19.7,6,145.0,175,3.62,2.77,15.5,0,1,5,6\n", "Maserati Bora,15.0,8,301.0,335,3.54,3.57,14.6,0,1,5,8\n", "Volvo 142E,,,,,,,,,,,\n", "\n", "```\n", "\n", "Пример простейшего `csv` файла приведен выше. В этом примере первая строка выступает в качестве заголовка, т.е. в ней перечислены названия столбцов, разделенные запятыми. Далее в каждой строке приведены значения этих столбцов, разделенные запятыми. В последней строке пропущены значения всех столбцов, кроме первого (brand).\n", "\n", "\n", "Предположим, что текстовый файл с таким содержимым хранится в папке `data` под именем `mtcars.csv`. Для чтения такой таблицы используется метод [read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 32 entries, 0 to 31\n", "Data columns (total 12 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 brand 32 non-null object \n", " 1 mpg 31 non-null float64\n", " 2 cyl 31 non-null float64\n", " 3 disp 31 non-null float64\n", " 4 hp 31 non-null float64\n", " 5 drat 31 non-null float64\n", " 6 wt 31 non-null float64\n", " 7 qsec 31 non-null float64\n", " 8 vs 31 non-null float64\n", " 9 am 31 non-null float64\n", " 10 gear 31 non-null float64\n", " 11 carb 31 non-null float64\n", "dtypes: float64(11), object(1)\n", "memory usage: 3.1+ KB\n", "None\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
brandmpgcyldisphpdratwtqsecvsamgearcarb
0Mazda RX421.06.0160.0110.03.902.62016.460.01.04.04.0
1Mazda RX4 Wag21.06.0160.0110.03.902.87517.020.01.04.04.0
2Datsun 71022.84.0108.093.03.852.32018.611.01.04.01.0
3Hornet 4 Drive21.46.0258.0110.03.083.21519.441.00.03.01.0
4Hornet Sportabout18.78.0360.0175.03.153.44017.020.00.03.02.0
\n", "
" ], "text/plain": [ " brand mpg cyl disp hp drat wt qsec vs am \\\n", "0 Mazda RX4 21.0 6.0 160.0 110.0 3.90 2.620 16.46 0.0 1.0 \n", "1 Mazda RX4 Wag 21.0 6.0 160.0 110.0 3.90 2.875 17.02 0.0 1.0 \n", "2 Datsun 710 22.8 4.0 108.0 93.0 3.85 2.320 18.61 1.0 1.0 \n", "3 Hornet 4 Drive 21.4 6.0 258.0 110.0 3.08 3.215 19.44 1.0 0.0 \n", "4 Hornet Sportabout 18.7 8.0 360.0 175.0 3.15 3.440 17.02 0.0 0.0 \n", "\n", " gear carb \n", "0 4.0 4.0 \n", "1 4.0 4.0 \n", "2 4.0 1.0 \n", "3 3.0 1.0 \n", "4 3.0 2.0 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "path = os.path.join(\"data\", \"mtcars.csv\")\n", "\n", "df = pd.read_csv(path)\n", "print(df.info())\n", "df.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "В самом простом варианте использования функции `read_csv` \n", "- имена столбцов распознаются из первой строки файла (параметром `header` можно повлиять на это);\n", "- в качестве индекса генерируется `RangeIndex` (параметром `index_col` можно выбрать столбец индекса);\n", "- в качестве разделителя ожидается символ запятой (`,`) (параметром `sep` можно на это повлиять);\n", "- пропущенные значения заполняются `np.nan`;\n", "- столбцы с датами не распознаются.\n", "\n", "```{note}\n", "Метод [DataFrame.head](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html) возвращает первые `n` строк таблицы. По умолчанию `n` равно 5, но можно указать явно. Похожий по смыслу метод [DataFrame.tail](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.tail.html) возвращает последние `n` строк.\n", "\n", "Метод [DataFrame.info](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html) печатает информацию о таблице. В частности, из вывода этой функции можно понять количество строк и столбцов, тип индекса таблицы, имя каждого столбца, тип данных и количество непропущенных значений в них.\n", "```\n", "\n", "Считаем эту таблицу ещё раз, указав в этот раз в качестве индекса столбец `brand`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcyldisphpdratwtqsecvsamgearcarb
brand
Mazda RX421.06.0160.0110.03.902.62016.460.01.04.04.0
Mazda RX4 Wag21.06.0160.0110.03.902.87517.020.01.04.04.0
Datsun 71022.84.0108.093.03.852.32018.611.01.04.01.0
Hornet 4 Drive21.46.0258.0110.03.083.21519.441.00.03.01.0
Hornet Sportabout18.78.0360.0175.03.153.44017.020.00.03.02.0
Valiant18.16.0225.0105.02.763.46020.221.00.03.01.0
Duster 36014.38.0360.0245.03.213.57015.840.00.03.04.0
Merc 240D24.44.0146.762.03.693.19020.001.00.04.02.0
Merc 23022.84.0140.895.03.923.15022.901.00.04.02.0
Merc 28019.26.0167.6123.03.923.44018.301.00.04.04.0
Merc 280C17.86.0167.6123.03.923.44018.901.00.04.04.0
Merc 450SE16.48.0275.8180.03.074.07017.400.00.03.03.0
Merc 450SL17.38.0275.8180.03.073.73017.600.00.03.03.0
Merc 450SLC15.28.0275.8180.03.073.78018.000.00.03.03.0
Cadillac Fleetwood10.48.0472.0205.02.935.25017.980.00.03.04.0
Lincoln Continental10.48.0460.0215.03.005.42417.820.00.03.04.0
Chrysler Imperial14.78.0440.0230.03.235.34517.420.00.03.04.0
Fiat 12832.44.078.766.04.082.20019.471.01.04.01.0
Honda Civic30.44.075.752.04.931.61518.521.01.04.02.0
Toyota Corolla33.94.071.165.04.221.83519.901.01.04.01.0
Toyota Corona21.54.0120.197.03.702.46520.011.00.03.01.0
Dodge Challenger15.58.0318.0150.02.763.52016.870.00.03.02.0
AMC Javelin15.28.0304.0150.03.153.43517.300.00.03.02.0
Camaro Z2813.38.0350.0245.03.733.84015.410.00.03.04.0
Pontiac Firebird19.28.0400.0175.03.083.84517.050.00.03.02.0
Fiat X1-927.34.079.066.04.081.93518.901.01.04.01.0
Porsche 914-226.04.0120.391.04.432.14016.700.01.05.02.0
Lotus Europa30.44.095.1113.03.771.51316.901.01.05.02.0
Ford Pantera L15.88.0351.0264.04.223.17014.500.01.05.04.0
Ferrari Dino19.76.0145.0175.03.622.77015.500.01.05.06.0
Maserati Bora15.08.0301.0335.03.543.57014.600.01.05.08.0
Volvo 142ENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am \\\n", "brand \n", "Mazda RX4 21.0 6.0 160.0 110.0 3.90 2.620 16.46 0.0 1.0 \n", "Mazda RX4 Wag 21.0 6.0 160.0 110.0 3.90 2.875 17.02 0.0 1.0 \n", "Datsun 710 22.8 4.0 108.0 93.0 3.85 2.320 18.61 1.0 1.0 \n", "Hornet 4 Drive 21.4 6.0 258.0 110.0 3.08 3.215 19.44 1.0 0.0 \n", "Hornet Sportabout 18.7 8.0 360.0 175.0 3.15 3.440 17.02 0.0 0.0 \n", "Valiant 18.1 6.0 225.0 105.0 2.76 3.460 20.22 1.0 0.0 \n", "Duster 360 14.3 8.0 360.0 245.0 3.21 3.570 15.84 0.0 0.0 \n", "Merc 240D 24.4 4.0 146.7 62.0 3.69 3.190 20.00 1.0 0.0 \n", "Merc 230 22.8 4.0 140.8 95.0 3.92 3.150 22.90 1.0 0.0 \n", "Merc 280 19.2 6.0 167.6 123.0 3.92 3.440 18.30 1.0 0.0 \n", "Merc 280C 17.8 6.0 167.6 123.0 3.92 3.440 18.90 1.0 0.0 \n", "Merc 450SE 16.4 8.0 275.8 180.0 3.07 4.070 17.40 0.0 0.0 \n", "Merc 450SL 17.3 8.0 275.8 180.0 3.07 3.730 17.60 0.0 0.0 \n", "Merc 450SLC 15.2 8.0 275.8 180.0 3.07 3.780 18.00 0.0 0.0 \n", "Cadillac Fleetwood 10.4 8.0 472.0 205.0 2.93 5.250 17.98 0.0 0.0 \n", "Lincoln Continental 10.4 8.0 460.0 215.0 3.00 5.424 17.82 0.0 0.0 \n", "Chrysler Imperial 14.7 8.0 440.0 230.0 3.23 5.345 17.42 0.0 0.0 \n", "Fiat 128 32.4 4.0 78.7 66.0 4.08 2.200 19.47 1.0 1.0 \n", "Honda Civic 30.4 4.0 75.7 52.0 4.93 1.615 18.52 1.0 1.0 \n", "Toyota Corolla 33.9 4.0 71.1 65.0 4.22 1.835 19.90 1.0 1.0 \n", "Toyota Corona 21.5 4.0 120.1 97.0 3.70 2.465 20.01 1.0 0.0 \n", "Dodge Challenger 15.5 8.0 318.0 150.0 2.76 3.520 16.87 0.0 0.0 \n", "AMC Javelin 15.2 8.0 304.0 150.0 3.15 3.435 17.30 0.0 0.0 \n", "Camaro Z28 13.3 8.0 350.0 245.0 3.73 3.840 15.41 0.0 0.0 \n", "Pontiac Firebird 19.2 8.0 400.0 175.0 3.08 3.845 17.05 0.0 0.0 \n", "Fiat X1-9 27.3 4.0 79.0 66.0 4.08 1.935 18.90 1.0 1.0 \n", "Porsche 914-2 26.0 4.0 120.3 91.0 4.43 2.140 16.70 0.0 1.0 \n", "Lotus Europa 30.4 4.0 95.1 113.0 3.77 1.513 16.90 1.0 1.0 \n", "Ford Pantera L 15.8 8.0 351.0 264.0 4.22 3.170 14.50 0.0 1.0 \n", "Ferrari Dino 19.7 6.0 145.0 175.0 3.62 2.770 15.50 0.0 1.0 \n", "Maserati Bora 15.0 8.0 301.0 335.0 3.54 3.570 14.60 0.0 1.0 \n", "Volvo 142E NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", " gear carb \n", "brand \n", "Mazda RX4 4.0 4.0 \n", "Mazda RX4 Wag 4.0 4.0 \n", "Datsun 710 4.0 1.0 \n", "Hornet 4 Drive 3.0 1.0 \n", "Hornet Sportabout 3.0 2.0 \n", "Valiant 3.0 1.0 \n", "Duster 360 3.0 4.0 \n", "Merc 240D 4.0 2.0 \n", "Merc 230 4.0 2.0 \n", "Merc 280 4.0 4.0 \n", "Merc 280C 4.0 4.0 \n", "Merc 450SE 3.0 3.0 \n", "Merc 450SL 3.0 3.0 \n", "Merc 450SLC 3.0 3.0 \n", "Cadillac Fleetwood 3.0 4.0 \n", "Lincoln Continental 3.0 4.0 \n", "Chrysler Imperial 3.0 4.0 \n", "Fiat 128 4.0 1.0 \n", "Honda Civic 4.0 2.0 \n", "Toyota Corolla 4.0 1.0 \n", "Toyota Corona 3.0 1.0 \n", "Dodge Challenger 3.0 2.0 \n", "AMC Javelin 3.0 2.0 \n", "Camaro Z28 3.0 4.0 \n", "Pontiac Firebird 3.0 2.0 \n", "Fiat X1-9 4.0 1.0 \n", "Porsche 914-2 5.0 2.0 \n", "Lotus Europa 5.0 2.0 \n", "Ford Pantera L 5.0 4.0 \n", "Ferrari Dino 5.0 6.0 \n", "Maserati Bora 5.0 8.0 \n", "Volvo 142E NaN NaN " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(path, index_col=\"brand\", sep=\",\")\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "Аналогично можно считывать данные из таблиц `excel` методом [read_excel](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html). \n", "\n", "Методами [to_csv](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv) и [to_excel](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html#pandas.DataFrame.to_excel) можно сохранить `DataFrame` в таблицу удобном формате (для сохранения в `excel` необходимо поставить библиотеку [openpyxl](https://openpyxl.readthedocs.io/en/stable/) или её аналоги)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Индексация DataFrame\n", "\n", "Доступ к строкам `DataFrame` осуществляется в точности, как осуществляется доступ к элементам `Series`, т.е. используются методы `loc` и `iloc`, которые сразу возвращают строку в виде объекта `pd.Series`, у которого в качестве индекса выступают имена столбцов." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "mpg NaN\n", "cyl NaN\n", "disp NaN\n", "hp NaN\n", "drat NaN\n", "wt NaN\n", "qsec NaN\n", "vs NaN\n", "am NaN\n", "gear NaN\n", "carb NaN\n", "Name: Volvo 142E, dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[\"Volvo 142E\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Оператор `[]` датафрейма предназначен для выбора столбцов. Передавать в него можно имя нужного столбца или список таких имен. Результат в первом случае --- столбец в виде `pd.Series`, а во втором случае --- таблица `pd.DataFrame` из указанных столбцов." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "brand\n", "Mazda RX4 21.0\n", "Mazda RX4 Wag 21.0\n", "Datsun 710 22.8\n", "Hornet 4 Drive 21.4\n", "Hornet Sportabout 18.7\n", "Name: mpg, dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"mpg\"].head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cylwt
brand
Lotus Europa4.01.513
Ford Pantera L8.03.170
Ferrari Dino6.02.770
Maserati Bora8.03.570
Volvo 142ENaNNaN
\n", "
" ], "text/plain": [ " cyl wt\n", "brand \n", "Lotus Europa 4.0 1.513\n", "Ford Pantera L 8.0 3.170\n", "Ferrari Dino 6.0 2.770\n", "Maserati Bora 8.0 3.570\n", "Volvo 142E NaN NaN" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[[\"cyl\", \"wt\"]].tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если в названии столбца нет пробелов, то можно обращаться к столбцу, как к атрибуту объекта. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "brand\n", "Mazda RX4 4.0\n", "Mazda RX4 Wag 4.0\n", "Datsun 710 1.0\n", "Name: carb, dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.carb.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Методы [DataFrame.at](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.at.html) и [DataFrame.iat](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iat.html#pandas.DataFrame.iat) позволяют получить одно значение на пересечении строки и столбца." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21.0" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iat[0, 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Добавление столбцов \n", "\n", "Добавление и изменение столбцов в таблицу похоже на добавление элементов в словарь. При этом, если добавляется столбец с индексом (`pd.Series`), то данные автоматически выравниваются. \n", "\n", "Пусть поступили данные о стоимости автомобилей определенной марки." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcyldisphpdratwtqsecvsamgearcarbprice
brand
Mazda RX421.06.0160.0110.03.902.62016.460.01.04.04.0100000.0
Mazda RX4 Wag21.06.0160.0110.03.902.87517.020.01.04.04.0NaN
Datsun 71022.84.0108.093.03.852.32018.611.01.04.01.0NaN
Hornet 4 Drive21.46.0258.0110.03.083.21519.441.00.03.01.0NaN
Hornet Sportabout18.78.0360.0175.03.153.44017.020.00.03.02.0NaN
Valiant18.16.0225.0105.02.763.46020.221.00.03.01.0NaN
Duster 36014.38.0360.0245.03.213.57015.840.00.03.04.0NaN
Merc 240D24.44.0146.762.03.693.19020.001.00.04.02.0NaN
Merc 23022.84.0140.895.03.923.15022.901.00.04.02.0NaN
Merc 28019.26.0167.6123.03.923.44018.301.00.04.04.0NaN
Merc 280C17.86.0167.6123.03.923.44018.901.00.04.04.0NaN
Merc 450SE16.48.0275.8180.03.074.07017.400.00.03.03.0NaN
Merc 450SL17.38.0275.8180.03.073.73017.600.00.03.03.0NaN
Merc 450SLC15.28.0275.8180.03.073.78018.000.00.03.03.0NaN
Cadillac Fleetwood10.48.0472.0205.02.935.25017.980.00.03.04.0NaN
Lincoln Continental10.48.0460.0215.03.005.42417.820.00.03.04.0NaN
Chrysler Imperial14.78.0440.0230.03.235.34517.420.00.03.04.0NaN
Fiat 12832.44.078.766.04.082.20019.471.01.04.01.0NaN
Honda Civic30.44.075.752.04.931.61518.521.01.04.02.0NaN
Toyota Corolla33.94.071.165.04.221.83519.901.01.04.01.0NaN
Toyota Corona21.54.0120.197.03.702.46520.011.00.03.01.0NaN
Dodge Challenger15.58.0318.0150.02.763.52016.870.00.03.02.0NaN
AMC Javelin15.28.0304.0150.03.153.43517.300.00.03.02.0NaN
Camaro Z2813.38.0350.0245.03.733.84015.410.00.03.04.0NaN
Pontiac Firebird19.28.0400.0175.03.083.84517.050.00.03.02.0NaN
Fiat X1-927.34.079.066.04.081.93518.901.01.04.01.0NaN
Porsche 914-226.04.0120.391.04.432.14016.700.01.05.02.0NaN
Lotus Europa30.44.095.1113.03.771.51316.901.01.05.02.0120000.0
Ford Pantera L15.88.0351.0264.04.223.17014.500.01.05.04.0NaN
Ferrari Dino19.76.0145.0175.03.622.77015.500.01.05.06.0NaN
Maserati Bora15.08.0301.0335.03.543.57014.600.01.05.08.0NaN
Volvo 142ENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am \\\n", "brand \n", "Mazda RX4 21.0 6.0 160.0 110.0 3.90 2.620 16.46 0.0 1.0 \n", "Mazda RX4 Wag 21.0 6.0 160.0 110.0 3.90 2.875 17.02 0.0 1.0 \n", "Datsun 710 22.8 4.0 108.0 93.0 3.85 2.320 18.61 1.0 1.0 \n", "Hornet 4 Drive 21.4 6.0 258.0 110.0 3.08 3.215 19.44 1.0 0.0 \n", "Hornet Sportabout 18.7 8.0 360.0 175.0 3.15 3.440 17.02 0.0 0.0 \n", "Valiant 18.1 6.0 225.0 105.0 2.76 3.460 20.22 1.0 0.0 \n", "Duster 360 14.3 8.0 360.0 245.0 3.21 3.570 15.84 0.0 0.0 \n", "Merc 240D 24.4 4.0 146.7 62.0 3.69 3.190 20.00 1.0 0.0 \n", "Merc 230 22.8 4.0 140.8 95.0 3.92 3.150 22.90 1.0 0.0 \n", "Merc 280 19.2 6.0 167.6 123.0 3.92 3.440 18.30 1.0 0.0 \n", "Merc 280C 17.8 6.0 167.6 123.0 3.92 3.440 18.90 1.0 0.0 \n", "Merc 450SE 16.4 8.0 275.8 180.0 3.07 4.070 17.40 0.0 0.0 \n", "Merc 450SL 17.3 8.0 275.8 180.0 3.07 3.730 17.60 0.0 0.0 \n", "Merc 450SLC 15.2 8.0 275.8 180.0 3.07 3.780 18.00 0.0 0.0 \n", "Cadillac Fleetwood 10.4 8.0 472.0 205.0 2.93 5.250 17.98 0.0 0.0 \n", "Lincoln Continental 10.4 8.0 460.0 215.0 3.00 5.424 17.82 0.0 0.0 \n", "Chrysler Imperial 14.7 8.0 440.0 230.0 3.23 5.345 17.42 0.0 0.0 \n", "Fiat 128 32.4 4.0 78.7 66.0 4.08 2.200 19.47 1.0 1.0 \n", "Honda Civic 30.4 4.0 75.7 52.0 4.93 1.615 18.52 1.0 1.0 \n", "Toyota Corolla 33.9 4.0 71.1 65.0 4.22 1.835 19.90 1.0 1.0 \n", "Toyota Corona 21.5 4.0 120.1 97.0 3.70 2.465 20.01 1.0 0.0 \n", "Dodge Challenger 15.5 8.0 318.0 150.0 2.76 3.520 16.87 0.0 0.0 \n", "AMC Javelin 15.2 8.0 304.0 150.0 3.15 3.435 17.30 0.0 0.0 \n", "Camaro Z28 13.3 8.0 350.0 245.0 3.73 3.840 15.41 0.0 0.0 \n", "Pontiac Firebird 19.2 8.0 400.0 175.0 3.08 3.845 17.05 0.0 0.0 \n", "Fiat X1-9 27.3 4.0 79.0 66.0 4.08 1.935 18.90 1.0 1.0 \n", "Porsche 914-2 26.0 4.0 120.3 91.0 4.43 2.140 16.70 0.0 1.0 \n", "Lotus Europa 30.4 4.0 95.1 113.0 3.77 1.513 16.90 1.0 1.0 \n", "Ford Pantera L 15.8 8.0 351.0 264.0 4.22 3.170 14.50 0.0 1.0 \n", "Ferrari Dino 19.7 6.0 145.0 175.0 3.62 2.770 15.50 0.0 1.0 \n", "Maserati Bora 15.0 8.0 301.0 335.0 3.54 3.570 14.60 0.0 1.0 \n", "Volvo 142E NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", " gear carb price \n", "brand \n", "Mazda RX4 4.0 4.0 100000.0 \n", "Mazda RX4 Wag 4.0 4.0 NaN \n", "Datsun 710 4.0 1.0 NaN \n", "Hornet 4 Drive 3.0 1.0 NaN \n", "Hornet Sportabout 3.0 2.0 NaN \n", "Valiant 3.0 1.0 NaN \n", "Duster 360 3.0 4.0 NaN \n", "Merc 240D 4.0 2.0 NaN \n", "Merc 230 4.0 2.0 NaN \n", "Merc 280 4.0 4.0 NaN \n", "Merc 280C 4.0 4.0 NaN \n", "Merc 450SE 3.0 3.0 NaN \n", "Merc 450SL 3.0 3.0 NaN \n", "Merc 450SLC 3.0 3.0 NaN \n", "Cadillac Fleetwood 3.0 4.0 NaN \n", "Lincoln Continental 3.0 4.0 NaN \n", "Chrysler Imperial 3.0 4.0 NaN \n", "Fiat 128 4.0 1.0 NaN \n", "Honda Civic 4.0 2.0 NaN \n", "Toyota Corolla 4.0 1.0 NaN \n", "Toyota Corona 3.0 1.0 NaN \n", "Dodge Challenger 3.0 2.0 NaN \n", "AMC Javelin 3.0 2.0 NaN \n", "Camaro Z28 3.0 4.0 NaN \n", "Pontiac Firebird 3.0 2.0 NaN \n", "Fiat X1-9 4.0 1.0 NaN \n", "Porsche 914-2 5.0 2.0 NaN \n", "Lotus Europa 5.0 2.0 120000.0 \n", "Ford Pantera L 5.0 4.0 NaN \n", "Ferrari Dino 5.0 6.0 NaN \n", "Maserati Bora 5.0 8.0 NaN \n", "Volvo 142E NaN NaN NaN " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "price = pd.Series({\n", " \"Mazda RX4\": 100_000, \n", " \"Lotus Europa\": 120_000\n", " })\n", "df[\"price\"] = price\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Пропущенные значения\n", "\n", "Большинство методов `pandas` разработаны таким образом, чтобы обрабатывать пропущенные значения. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "mpg 20.048387\n", "cyl 6.258065\n", "disp 234.261290\n", "hp 147.903226\n", "drat 3.580000\n", "wt 3.231355\n", "qsec 17.824516\n", "vs 0.419355\n", "am 0.387097\n", "gear 3.677419\n", "carb 2.838710\n", "price 110000.000000\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Тем не менее иногда необходимо избавиться от строк или столбцов с пропущенными данными. Для этого удобно использовать метод [DataFrame.dropna](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html), который возвращает новую таблицу, состоящую из строк исходной таблицы, не содержащих `np.nan`." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcyldisphpdratwtqsecvsamgearcarbprice
brand
Mazda RX421.06.0160.0110.03.902.62016.460.01.04.04.0100000.0
Lotus Europa30.44.095.1113.03.771.51316.901.01.05.02.0120000.0
\n", "
" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am gear \\\n", "brand \n", "Mazda RX4 21.0 6.0 160.0 110.0 3.90 2.620 16.46 0.0 1.0 4.0 \n", "Lotus Europa 30.4 4.0 95.1 113.0 3.77 1.513 16.90 1.0 1.0 5.0 \n", "\n", " carb price \n", "brand \n", "Mazda RX4 4.0 100000.0 \n", "Lotus Europa 2.0 120000.0 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dropna()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Параметром `axis` можно выбрать, отбрасывать ли строки или столбцы. Параметр `how` может быть равен или `all` или `any` (по умолчанию `any`) и в зависимости от этого отбрасываются те строки таблицы (столбцы), в которых пропущены все или хотя бы одно значения. \n", "\n", "Вместо того чтобы отбросить пропущенные значения, методом [df.fillna](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html) можно их заполнить. В самом простом варианте он заполняет `na` во всех столбцах одним и тем же значением (по умолчанию не на месте)." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcyldisphpdratwtqsecvsamgearcarbprice
brand
Mazda RX421.06.0160.0110.03.902.62016.460.01.04.04.0100000.0
Mazda RX4 Wag21.06.0160.0110.03.902.87517.020.01.04.04.00.0
Datsun 71022.84.0108.093.03.852.32018.611.01.04.01.00.0
Hornet 4 Drive21.46.0258.0110.03.083.21519.441.00.03.01.00.0
Hornet Sportabout18.78.0360.0175.03.153.44017.020.00.03.02.00.0
Valiant18.16.0225.0105.02.763.46020.221.00.03.01.00.0
Duster 36014.38.0360.0245.03.213.57015.840.00.03.04.00.0
Merc 240D24.44.0146.762.03.693.19020.001.00.04.02.00.0
Merc 23022.84.0140.895.03.923.15022.901.00.04.02.00.0
Merc 28019.26.0167.6123.03.923.44018.301.00.04.04.00.0
Merc 280C17.86.0167.6123.03.923.44018.901.00.04.04.00.0
Merc 450SE16.48.0275.8180.03.074.07017.400.00.03.03.00.0
Merc 450SL17.38.0275.8180.03.073.73017.600.00.03.03.00.0
Merc 450SLC15.28.0275.8180.03.073.78018.000.00.03.03.00.0
Cadillac Fleetwood10.48.0472.0205.02.935.25017.980.00.03.04.00.0
Lincoln Continental10.48.0460.0215.03.005.42417.820.00.03.04.00.0
Chrysler Imperial14.78.0440.0230.03.235.34517.420.00.03.04.00.0
Fiat 12832.44.078.766.04.082.20019.471.01.04.01.00.0
Honda Civic30.44.075.752.04.931.61518.521.01.04.02.00.0
Toyota Corolla33.94.071.165.04.221.83519.901.01.04.01.00.0
Toyota Corona21.54.0120.197.03.702.46520.011.00.03.01.00.0
Dodge Challenger15.58.0318.0150.02.763.52016.870.00.03.02.00.0
AMC Javelin15.28.0304.0150.03.153.43517.300.00.03.02.00.0
Camaro Z2813.38.0350.0245.03.733.84015.410.00.03.04.00.0
Pontiac Firebird19.28.0400.0175.03.083.84517.050.00.03.02.00.0
Fiat X1-927.34.079.066.04.081.93518.901.01.04.01.00.0
Porsche 914-226.04.0120.391.04.432.14016.700.01.05.02.00.0
Lotus Europa30.44.095.1113.03.771.51316.901.01.05.02.0120000.0
Ford Pantera L15.88.0351.0264.04.223.17014.500.01.05.04.00.0
Ferrari Dino19.76.0145.0175.03.622.77015.500.01.05.06.00.0
Maserati Bora15.08.0301.0335.03.543.57014.600.01.05.08.00.0
Volvo 142E0.00.00.00.00.000.0000.000.00.00.00.00.0
\n", "
" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am \\\n", "brand \n", "Mazda RX4 21.0 6.0 160.0 110.0 3.90 2.620 16.46 0.0 1.0 \n", "Mazda RX4 Wag 21.0 6.0 160.0 110.0 3.90 2.875 17.02 0.0 1.0 \n", "Datsun 710 22.8 4.0 108.0 93.0 3.85 2.320 18.61 1.0 1.0 \n", "Hornet 4 Drive 21.4 6.0 258.0 110.0 3.08 3.215 19.44 1.0 0.0 \n", "Hornet Sportabout 18.7 8.0 360.0 175.0 3.15 3.440 17.02 0.0 0.0 \n", "Valiant 18.1 6.0 225.0 105.0 2.76 3.460 20.22 1.0 0.0 \n", "Duster 360 14.3 8.0 360.0 245.0 3.21 3.570 15.84 0.0 0.0 \n", "Merc 240D 24.4 4.0 146.7 62.0 3.69 3.190 20.00 1.0 0.0 \n", "Merc 230 22.8 4.0 140.8 95.0 3.92 3.150 22.90 1.0 0.0 \n", "Merc 280 19.2 6.0 167.6 123.0 3.92 3.440 18.30 1.0 0.0 \n", "Merc 280C 17.8 6.0 167.6 123.0 3.92 3.440 18.90 1.0 0.0 \n", "Merc 450SE 16.4 8.0 275.8 180.0 3.07 4.070 17.40 0.0 0.0 \n", "Merc 450SL 17.3 8.0 275.8 180.0 3.07 3.730 17.60 0.0 0.0 \n", "Merc 450SLC 15.2 8.0 275.8 180.0 3.07 3.780 18.00 0.0 0.0 \n", "Cadillac Fleetwood 10.4 8.0 472.0 205.0 2.93 5.250 17.98 0.0 0.0 \n", "Lincoln Continental 10.4 8.0 460.0 215.0 3.00 5.424 17.82 0.0 0.0 \n", "Chrysler Imperial 14.7 8.0 440.0 230.0 3.23 5.345 17.42 0.0 0.0 \n", "Fiat 128 32.4 4.0 78.7 66.0 4.08 2.200 19.47 1.0 1.0 \n", "Honda Civic 30.4 4.0 75.7 52.0 4.93 1.615 18.52 1.0 1.0 \n", "Toyota Corolla 33.9 4.0 71.1 65.0 4.22 1.835 19.90 1.0 1.0 \n", "Toyota Corona 21.5 4.0 120.1 97.0 3.70 2.465 20.01 1.0 0.0 \n", "Dodge Challenger 15.5 8.0 318.0 150.0 2.76 3.520 16.87 0.0 0.0 \n", "AMC Javelin 15.2 8.0 304.0 150.0 3.15 3.435 17.30 0.0 0.0 \n", "Camaro Z28 13.3 8.0 350.0 245.0 3.73 3.840 15.41 0.0 0.0 \n", "Pontiac Firebird 19.2 8.0 400.0 175.0 3.08 3.845 17.05 0.0 0.0 \n", "Fiat X1-9 27.3 4.0 79.0 66.0 4.08 1.935 18.90 1.0 1.0 \n", "Porsche 914-2 26.0 4.0 120.3 91.0 4.43 2.140 16.70 0.0 1.0 \n", "Lotus Europa 30.4 4.0 95.1 113.0 3.77 1.513 16.90 1.0 1.0 \n", "Ford Pantera L 15.8 8.0 351.0 264.0 4.22 3.170 14.50 0.0 1.0 \n", "Ferrari Dino 19.7 6.0 145.0 175.0 3.62 2.770 15.50 0.0 1.0 \n", "Maserati Bora 15.0 8.0 301.0 335.0 3.54 3.570 14.60 0.0 1.0 \n", "Volvo 142E 0.0 0.0 0.0 0.0 0.00 0.000 0.00 0.0 0.0 \n", "\n", " gear carb price \n", "brand \n", "Mazda RX4 4.0 4.0 100000.0 \n", "Mazda RX4 Wag 4.0 4.0 0.0 \n", "Datsun 710 4.0 1.0 0.0 \n", "Hornet 4 Drive 3.0 1.0 0.0 \n", "Hornet Sportabout 3.0 2.0 0.0 \n", "Valiant 3.0 1.0 0.0 \n", "Duster 360 3.0 4.0 0.0 \n", "Merc 240D 4.0 2.0 0.0 \n", "Merc 230 4.0 2.0 0.0 \n", "Merc 280 4.0 4.0 0.0 \n", "Merc 280C 4.0 4.0 0.0 \n", "Merc 450SE 3.0 3.0 0.0 \n", "Merc 450SL 3.0 3.0 0.0 \n", "Merc 450SLC 3.0 3.0 0.0 \n", "Cadillac Fleetwood 3.0 4.0 0.0 \n", "Lincoln Continental 3.0 4.0 0.0 \n", "Chrysler Imperial 3.0 4.0 0.0 \n", "Fiat 128 4.0 1.0 0.0 \n", "Honda Civic 4.0 2.0 0.0 \n", "Toyota Corolla 4.0 1.0 0.0 \n", "Toyota Corona 3.0 1.0 0.0 \n", "Dodge Challenger 3.0 2.0 0.0 \n", "AMC Javelin 3.0 2.0 0.0 \n", "Camaro Z28 3.0 4.0 0.0 \n", "Pontiac Firebird 3.0 2.0 0.0 \n", "Fiat X1-9 4.0 1.0 0.0 \n", "Porsche 914-2 5.0 2.0 0.0 \n", "Lotus Europa 5.0 2.0 120000.0 \n", "Ford Pantera L 5.0 4.0 0.0 \n", "Ferrari Dino 5.0 6.0 0.0 \n", "Maserati Bora 5.0 8.0 0.0 \n", "Volvo 142E 0.0 0.0 0.0 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.fillna(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Можно настроить более гибкое заполнение. Например, передавая словарь, можно выбрать свой заполнитель для каждого столбца. А параметром `method` можно заполнять пропущенные значения ближайшим непропущенным в строчках выше или ниже.\n", "\n", "Методом [DataFrame.interpolate](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html) можно заполнить пропущенные значения интерполируя по непропущенным." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcyldisphpdratwtqsecvsamgearcarbprice
brand
Mazda RX421.06.0160.0110.03.902.62016.460.01.04.04.0100000.000000
Mazda RX4 Wag21.06.0160.0110.03.902.87517.020.01.04.04.0100740.740741
Datsun 71022.84.0108.093.03.852.32018.611.01.04.01.0101481.481481
Hornet 4 Drive21.46.0258.0110.03.083.21519.441.00.03.01.0102222.222222
Hornet Sportabout18.78.0360.0175.03.153.44017.020.00.03.02.0102962.962963
Valiant18.16.0225.0105.02.763.46020.221.00.03.01.0103703.703704
Duster 36014.38.0360.0245.03.213.57015.840.00.03.04.0104444.444444
Merc 240D24.44.0146.762.03.693.19020.001.00.04.02.0105185.185185
Merc 23022.84.0140.895.03.923.15022.901.00.04.02.0105925.925926
Merc 28019.26.0167.6123.03.923.44018.301.00.04.04.0106666.666667
Merc 280C17.86.0167.6123.03.923.44018.901.00.04.04.0107407.407407
Merc 450SE16.48.0275.8180.03.074.07017.400.00.03.03.0108148.148148
Merc 450SL17.38.0275.8180.03.073.73017.600.00.03.03.0108888.888889
Merc 450SLC15.28.0275.8180.03.073.78018.000.00.03.03.0109629.629630
Cadillac Fleetwood10.48.0472.0205.02.935.25017.980.00.03.04.0110370.370370
Lincoln Continental10.48.0460.0215.03.005.42417.820.00.03.04.0111111.111111
Chrysler Imperial14.78.0440.0230.03.235.34517.420.00.03.04.0111851.851852
Fiat 12832.44.078.766.04.082.20019.471.01.04.01.0112592.592593
Honda Civic30.44.075.752.04.931.61518.521.01.04.02.0113333.333333
Toyota Corolla33.94.071.165.04.221.83519.901.01.04.01.0114074.074074
Toyota Corona21.54.0120.197.03.702.46520.011.00.03.01.0114814.814815
Dodge Challenger15.58.0318.0150.02.763.52016.870.00.03.02.0115555.555556
AMC Javelin15.28.0304.0150.03.153.43517.300.00.03.02.0116296.296296
Camaro Z2813.38.0350.0245.03.733.84015.410.00.03.04.0117037.037037
Pontiac Firebird19.28.0400.0175.03.083.84517.050.00.03.02.0117777.777778
Fiat X1-927.34.079.066.04.081.93518.901.01.04.01.0118518.518519
Porsche 914-226.04.0120.391.04.432.14016.700.01.05.02.0119259.259259
Lotus Europa30.44.095.1113.03.771.51316.901.01.05.02.0120000.000000
Ford Pantera L15.88.0351.0264.04.223.17014.500.01.05.04.0120000.000000
Ferrari Dino19.76.0145.0175.03.622.77015.500.01.05.06.0120000.000000
Maserati Bora15.08.0301.0335.03.543.57014.600.01.05.08.0120000.000000
Volvo 142E15.08.0301.0335.03.543.57014.600.01.05.08.0120000.000000
\n", "
" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am \\\n", "brand \n", "Mazda RX4 21.0 6.0 160.0 110.0 3.90 2.620 16.46 0.0 1.0 \n", "Mazda RX4 Wag 21.0 6.0 160.0 110.0 3.90 2.875 17.02 0.0 1.0 \n", "Datsun 710 22.8 4.0 108.0 93.0 3.85 2.320 18.61 1.0 1.0 \n", "Hornet 4 Drive 21.4 6.0 258.0 110.0 3.08 3.215 19.44 1.0 0.0 \n", "Hornet Sportabout 18.7 8.0 360.0 175.0 3.15 3.440 17.02 0.0 0.0 \n", "Valiant 18.1 6.0 225.0 105.0 2.76 3.460 20.22 1.0 0.0 \n", "Duster 360 14.3 8.0 360.0 245.0 3.21 3.570 15.84 0.0 0.0 \n", "Merc 240D 24.4 4.0 146.7 62.0 3.69 3.190 20.00 1.0 0.0 \n", "Merc 230 22.8 4.0 140.8 95.0 3.92 3.150 22.90 1.0 0.0 \n", "Merc 280 19.2 6.0 167.6 123.0 3.92 3.440 18.30 1.0 0.0 \n", "Merc 280C 17.8 6.0 167.6 123.0 3.92 3.440 18.90 1.0 0.0 \n", "Merc 450SE 16.4 8.0 275.8 180.0 3.07 4.070 17.40 0.0 0.0 \n", "Merc 450SL 17.3 8.0 275.8 180.0 3.07 3.730 17.60 0.0 0.0 \n", "Merc 450SLC 15.2 8.0 275.8 180.0 3.07 3.780 18.00 0.0 0.0 \n", "Cadillac Fleetwood 10.4 8.0 472.0 205.0 2.93 5.250 17.98 0.0 0.0 \n", "Lincoln Continental 10.4 8.0 460.0 215.0 3.00 5.424 17.82 0.0 0.0 \n", "Chrysler Imperial 14.7 8.0 440.0 230.0 3.23 5.345 17.42 0.0 0.0 \n", "Fiat 128 32.4 4.0 78.7 66.0 4.08 2.200 19.47 1.0 1.0 \n", "Honda Civic 30.4 4.0 75.7 52.0 4.93 1.615 18.52 1.0 1.0 \n", "Toyota Corolla 33.9 4.0 71.1 65.0 4.22 1.835 19.90 1.0 1.0 \n", "Toyota Corona 21.5 4.0 120.1 97.0 3.70 2.465 20.01 1.0 0.0 \n", "Dodge Challenger 15.5 8.0 318.0 150.0 2.76 3.520 16.87 0.0 0.0 \n", "AMC Javelin 15.2 8.0 304.0 150.0 3.15 3.435 17.30 0.0 0.0 \n", "Camaro Z28 13.3 8.0 350.0 245.0 3.73 3.840 15.41 0.0 0.0 \n", "Pontiac Firebird 19.2 8.0 400.0 175.0 3.08 3.845 17.05 0.0 0.0 \n", "Fiat X1-9 27.3 4.0 79.0 66.0 4.08 1.935 18.90 1.0 1.0 \n", "Porsche 914-2 26.0 4.0 120.3 91.0 4.43 2.140 16.70 0.0 1.0 \n", "Lotus Europa 30.4 4.0 95.1 113.0 3.77 1.513 16.90 1.0 1.0 \n", "Ford Pantera L 15.8 8.0 351.0 264.0 4.22 3.170 14.50 0.0 1.0 \n", "Ferrari Dino 19.7 6.0 145.0 175.0 3.62 2.770 15.50 0.0 1.0 \n", "Maserati Bora 15.0 8.0 301.0 335.0 3.54 3.570 14.60 0.0 1.0 \n", "Volvo 142E 15.0 8.0 301.0 335.0 3.54 3.570 14.60 0.0 1.0 \n", "\n", " gear carb price \n", "brand \n", "Mazda RX4 4.0 4.0 100000.000000 \n", "Mazda RX4 Wag 4.0 4.0 100740.740741 \n", "Datsun 710 4.0 1.0 101481.481481 \n", "Hornet 4 Drive 3.0 1.0 102222.222222 \n", "Hornet Sportabout 3.0 2.0 102962.962963 \n", "Valiant 3.0 1.0 103703.703704 \n", "Duster 360 3.0 4.0 104444.444444 \n", "Merc 240D 4.0 2.0 105185.185185 \n", "Merc 230 4.0 2.0 105925.925926 \n", "Merc 280 4.0 4.0 106666.666667 \n", "Merc 280C 4.0 4.0 107407.407407 \n", "Merc 450SE 3.0 3.0 108148.148148 \n", "Merc 450SL 3.0 3.0 108888.888889 \n", "Merc 450SLC 3.0 3.0 109629.629630 \n", "Cadillac Fleetwood 3.0 4.0 110370.370370 \n", "Lincoln Continental 3.0 4.0 111111.111111 \n", "Chrysler Imperial 3.0 4.0 111851.851852 \n", "Fiat 128 4.0 1.0 112592.592593 \n", "Honda Civic 4.0 2.0 113333.333333 \n", "Toyota Corolla 4.0 1.0 114074.074074 \n", "Toyota Corona 3.0 1.0 114814.814815 \n", "Dodge Challenger 3.0 2.0 115555.555556 \n", "AMC Javelin 3.0 2.0 116296.296296 \n", "Camaro Z28 3.0 4.0 117037.037037 \n", "Pontiac Firebird 3.0 2.0 117777.777778 \n", "Fiat X1-9 4.0 1.0 118518.518519 \n", "Porsche 914-2 5.0 2.0 119259.259259 \n", "Lotus Europa 5.0 2.0 120000.000000 \n", "Ford Pantera L 5.0 4.0 120000.000000 \n", "Ferrari Dino 5.0 6.0 120000.000000 \n", "Maserati Bora 5.0 8.0 120000.000000 \n", "Volvo 142E 5.0 8.0 120000.000000 " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.interpolate(method=\"linear\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## apply, map, applymap\n", "\n", "Применить некоторую функцию к всем строкам/столбцам таблицы можно методом [DataFrame.apply](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html). Передавать этому методу необходимо такую функцию, рассчитывая, что ей на вход будет подаваться объект `pd.series`, который в зависимости от значения параметра `axis` будет или строкой или столбцом таблицы." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "mpg 23.50\n", "cyl 4.00\n", "disp 400.90\n", "hp 283.00\n", "drat 2.17\n", "dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def f(x):\n", " return x.max() - x.min() \n", "\n", "df.apply(f, axis=\"rows\").head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Метод [Series.map](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html) позволяет применить функцию к каждому значению столбца. Чтобы продемонстрировать принцип работы этого метода, извлечем из столбца `brand` марку и модель автомобиля в разные столбцы.\n", "\n", "Для этого определим функции, которые в качестве аргумента принимают строку из столбца `brand` и возвращают марку или модель автомобиля из неё. В данной таблице, марка автомобиля всегда встречается до первого пробела, а модель автомобиля --- все после первого пробела." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mazda RX4 Wag -> Mazda and RX4 Wag\n" ] } ], "source": [ "def get_brand(x):\n", " return x.split(\" \")[0]\n", "\n", "def get_model(x):\n", " b, *m = x.split(\" \") # все после первого пробела попадает в переменную m\n", " return \" \".join(m)\n", "\n", "x = \"Mazda RX4 Wag\"\n", "print(f\"{x} -> {get_brand(x)} and {get_model(x)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Определив эти функции применим их к столбцу `brand`. Т.к. `brand` --- индекс, то создадим из него `pd.Series`, применим определенные функции методом `map` и запишем результаты в таблицу в качестве новых столбцов." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcyldisphpdratwtqsecvsamgearcarbpricebrandmodel
car
Mazda RX421.06.0160.0110.03.902.62016.460.01.04.04.0100000.0MazdaRX4
Mazda RX4 Wag21.06.0160.0110.03.902.87517.020.01.04.04.0NaNMazdaRX4 Wag
Datsun 71022.84.0108.093.03.852.32018.611.01.04.01.0NaNDatsun710
Hornet 4 Drive21.46.0258.0110.03.083.21519.441.00.03.01.0NaNHornet4 Drive
Hornet Sportabout18.78.0360.0175.03.153.44017.020.00.03.02.0NaNHornetSportabout
Valiant18.16.0225.0105.02.763.46020.221.00.03.01.0NaNValiant
Duster 36014.38.0360.0245.03.213.57015.840.00.03.04.0NaNDuster360
Merc 240D24.44.0146.762.03.693.19020.001.00.04.02.0NaNMerc240D
Merc 23022.84.0140.895.03.923.15022.901.00.04.02.0NaNMerc230
Merc 28019.26.0167.6123.03.923.44018.301.00.04.04.0NaNMerc280
Merc 280C17.86.0167.6123.03.923.44018.901.00.04.04.0NaNMerc280C
Merc 450SE16.48.0275.8180.03.074.07017.400.00.03.03.0NaNMerc450SE
Merc 450SL17.38.0275.8180.03.073.73017.600.00.03.03.0NaNMerc450SL
Merc 450SLC15.28.0275.8180.03.073.78018.000.00.03.03.0NaNMerc450SLC
Cadillac Fleetwood10.48.0472.0205.02.935.25017.980.00.03.04.0NaNCadillacFleetwood
Lincoln Continental10.48.0460.0215.03.005.42417.820.00.03.04.0NaNLincolnContinental
Chrysler Imperial14.78.0440.0230.03.235.34517.420.00.03.04.0NaNChryslerImperial
Fiat 12832.44.078.766.04.082.20019.471.01.04.01.0NaNFiat128
Honda Civic30.44.075.752.04.931.61518.521.01.04.02.0NaNHondaCivic
Toyota Corolla33.94.071.165.04.221.83519.901.01.04.01.0NaNToyotaCorolla
Toyota Corona21.54.0120.197.03.702.46520.011.00.03.01.0NaNToyotaCorona
Dodge Challenger15.58.0318.0150.02.763.52016.870.00.03.02.0NaNDodgeChallenger
AMC Javelin15.28.0304.0150.03.153.43517.300.00.03.02.0NaNAMCJavelin
Camaro Z2813.38.0350.0245.03.733.84015.410.00.03.04.0NaNCamaroZ28
Pontiac Firebird19.28.0400.0175.03.083.84517.050.00.03.02.0NaNPontiacFirebird
Fiat X1-927.34.079.066.04.081.93518.901.01.04.01.0NaNFiatX1-9
Porsche 914-226.04.0120.391.04.432.14016.700.01.05.02.0NaNPorsche914-2
Lotus Europa30.44.095.1113.03.771.51316.901.01.05.02.0120000.0LotusEuropa
Ford Pantera L15.88.0351.0264.04.223.17014.500.01.05.04.0NaNFordPantera L
Ferrari Dino19.76.0145.0175.03.622.77015.500.01.05.06.0NaNFerrariDino
Maserati Bora15.08.0301.0335.03.543.57014.600.01.05.08.0NaNMaseratiBora
Volvo 142ENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNVolvo142E
\n", "
" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am \\\n", "car \n", "Mazda RX4 21.0 6.0 160.0 110.0 3.90 2.620 16.46 0.0 1.0 \n", "Mazda RX4 Wag 21.0 6.0 160.0 110.0 3.90 2.875 17.02 0.0 1.0 \n", "Datsun 710 22.8 4.0 108.0 93.0 3.85 2.320 18.61 1.0 1.0 \n", "Hornet 4 Drive 21.4 6.0 258.0 110.0 3.08 3.215 19.44 1.0 0.0 \n", "Hornet Sportabout 18.7 8.0 360.0 175.0 3.15 3.440 17.02 0.0 0.0 \n", "Valiant 18.1 6.0 225.0 105.0 2.76 3.460 20.22 1.0 0.0 \n", "Duster 360 14.3 8.0 360.0 245.0 3.21 3.570 15.84 0.0 0.0 \n", "Merc 240D 24.4 4.0 146.7 62.0 3.69 3.190 20.00 1.0 0.0 \n", "Merc 230 22.8 4.0 140.8 95.0 3.92 3.150 22.90 1.0 0.0 \n", "Merc 280 19.2 6.0 167.6 123.0 3.92 3.440 18.30 1.0 0.0 \n", "Merc 280C 17.8 6.0 167.6 123.0 3.92 3.440 18.90 1.0 0.0 \n", "Merc 450SE 16.4 8.0 275.8 180.0 3.07 4.070 17.40 0.0 0.0 \n", "Merc 450SL 17.3 8.0 275.8 180.0 3.07 3.730 17.60 0.0 0.0 \n", "Merc 450SLC 15.2 8.0 275.8 180.0 3.07 3.780 18.00 0.0 0.0 \n", "Cadillac Fleetwood 10.4 8.0 472.0 205.0 2.93 5.250 17.98 0.0 0.0 \n", "Lincoln Continental 10.4 8.0 460.0 215.0 3.00 5.424 17.82 0.0 0.0 \n", "Chrysler Imperial 14.7 8.0 440.0 230.0 3.23 5.345 17.42 0.0 0.0 \n", "Fiat 128 32.4 4.0 78.7 66.0 4.08 2.200 19.47 1.0 1.0 \n", "Honda Civic 30.4 4.0 75.7 52.0 4.93 1.615 18.52 1.0 1.0 \n", "Toyota Corolla 33.9 4.0 71.1 65.0 4.22 1.835 19.90 1.0 1.0 \n", "Toyota Corona 21.5 4.0 120.1 97.0 3.70 2.465 20.01 1.0 0.0 \n", "Dodge Challenger 15.5 8.0 318.0 150.0 2.76 3.520 16.87 0.0 0.0 \n", "AMC Javelin 15.2 8.0 304.0 150.0 3.15 3.435 17.30 0.0 0.0 \n", "Camaro Z28 13.3 8.0 350.0 245.0 3.73 3.840 15.41 0.0 0.0 \n", "Pontiac Firebird 19.2 8.0 400.0 175.0 3.08 3.845 17.05 0.0 0.0 \n", "Fiat X1-9 27.3 4.0 79.0 66.0 4.08 1.935 18.90 1.0 1.0 \n", "Porsche 914-2 26.0 4.0 120.3 91.0 4.43 2.140 16.70 0.0 1.0 \n", "Lotus Europa 30.4 4.0 95.1 113.0 3.77 1.513 16.90 1.0 1.0 \n", "Ford Pantera L 15.8 8.0 351.0 264.0 4.22 3.170 14.50 0.0 1.0 \n", "Ferrari Dino 19.7 6.0 145.0 175.0 3.62 2.770 15.50 0.0 1.0 \n", "Maserati Bora 15.0 8.0 301.0 335.0 3.54 3.570 14.60 0.0 1.0 \n", "Volvo 142E NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", " gear carb price brand model \n", "car \n", "Mazda RX4 4.0 4.0 100000.0 Mazda RX4 \n", "Mazda RX4 Wag 4.0 4.0 NaN Mazda RX4 Wag \n", "Datsun 710 4.0 1.0 NaN Datsun 710 \n", "Hornet 4 Drive 3.0 1.0 NaN Hornet 4 Drive \n", "Hornet Sportabout 3.0 2.0 NaN Hornet Sportabout \n", "Valiant 3.0 1.0 NaN Valiant \n", "Duster 360 3.0 4.0 NaN Duster 360 \n", "Merc 240D 4.0 2.0 NaN Merc 240D \n", "Merc 230 4.0 2.0 NaN Merc 230 \n", "Merc 280 4.0 4.0 NaN Merc 280 \n", "Merc 280C 4.0 4.0 NaN Merc 280C \n", "Merc 450SE 3.0 3.0 NaN Merc 450SE \n", "Merc 450SL 3.0 3.0 NaN Merc 450SL \n", "Merc 450SLC 3.0 3.0 NaN Merc 450SLC \n", "Cadillac Fleetwood 3.0 4.0 NaN Cadillac Fleetwood \n", "Lincoln Continental 3.0 4.0 NaN Lincoln Continental \n", "Chrysler Imperial 3.0 4.0 NaN Chrysler Imperial \n", "Fiat 128 4.0 1.0 NaN Fiat 128 \n", "Honda Civic 4.0 2.0 NaN Honda Civic \n", "Toyota Corolla 4.0 1.0 NaN Toyota Corolla \n", "Toyota Corona 3.0 1.0 NaN Toyota Corona \n", "Dodge Challenger 3.0 2.0 NaN Dodge Challenger \n", "AMC Javelin 3.0 2.0 NaN AMC Javelin \n", "Camaro Z28 3.0 4.0 NaN Camaro Z28 \n", "Pontiac Firebird 3.0 2.0 NaN Pontiac Firebird \n", "Fiat X1-9 4.0 1.0 NaN Fiat X1-9 \n", "Porsche 914-2 5.0 2.0 NaN Porsche 914-2 \n", "Lotus Europa 5.0 2.0 120000.0 Lotus Europa \n", "Ford Pantera L 5.0 4.0 NaN Ford Pantera L \n", "Ferrari Dino 5.0 6.0 NaN Ferrari Dino \n", "Maserati Bora 5.0 8.0 NaN Maserati Bora \n", "Volvo 142E NaN NaN NaN Volvo 142E " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fullname = pd.Series(df.index, index=df.index)\n", "df.index.rename(\"car\", inplace=True)\n", "df[\"brand\"] = fullname.map(get_brand)\n", "df[\"model\"] = fullname.map(get_model)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Вместо функции, методу `map` можно передать словарь." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcyldisphpdratwtqsecvsamgearcarbpricebrandmodel
car
Mazda RX421.0six160.0110.03.902.62016.460.01.04.04.0100000.0MazdaRX4
Mazda RX4 Wag21.0six160.0110.03.902.87517.020.01.04.04.0NaNMazdaRX4 Wag
Datsun 71022.8four108.093.03.852.32018.611.01.04.01.0NaNDatsun710
Hornet 4 Drive21.4six258.0110.03.083.21519.441.00.03.01.0NaNHornet4 Drive
Hornet Sportabout18.7eight360.0175.03.153.44017.020.00.03.02.0NaNHornetSportabout
Valiant18.1six225.0105.02.763.46020.221.00.03.01.0NaNValiant
Duster 36014.3eight360.0245.03.213.57015.840.00.03.04.0NaNDuster360
Merc 240D24.4four146.762.03.693.19020.001.00.04.02.0NaNMerc240D
Merc 23022.8four140.895.03.923.15022.901.00.04.02.0NaNMerc230
Merc 28019.2six167.6123.03.923.44018.301.00.04.04.0NaNMerc280
Merc 280C17.8six167.6123.03.923.44018.901.00.04.04.0NaNMerc280C
Merc 450SE16.4eight275.8180.03.074.07017.400.00.03.03.0NaNMerc450SE
Merc 450SL17.3eight275.8180.03.073.73017.600.00.03.03.0NaNMerc450SL
Merc 450SLC15.2eight275.8180.03.073.78018.000.00.03.03.0NaNMerc450SLC
Cadillac Fleetwood10.4eight472.0205.02.935.25017.980.00.03.04.0NaNCadillacFleetwood
Lincoln Continental10.4eight460.0215.03.005.42417.820.00.03.04.0NaNLincolnContinental
Chrysler Imperial14.7eight440.0230.03.235.34517.420.00.03.04.0NaNChryslerImperial
Fiat 12832.4four78.766.04.082.20019.471.01.04.01.0NaNFiat128
Honda Civic30.4four75.752.04.931.61518.521.01.04.02.0NaNHondaCivic
Toyota Corolla33.9four71.165.04.221.83519.901.01.04.01.0NaNToyotaCorolla
Toyota Corona21.5four120.197.03.702.46520.011.00.03.01.0NaNToyotaCorona
Dodge Challenger15.5eight318.0150.02.763.52016.870.00.03.02.0NaNDodgeChallenger
AMC Javelin15.2eight304.0150.03.153.43517.300.00.03.02.0NaNAMCJavelin
Camaro Z2813.3eight350.0245.03.733.84015.410.00.03.04.0NaNCamaroZ28
Pontiac Firebird19.2eight400.0175.03.083.84517.050.00.03.02.0NaNPontiacFirebird
Fiat X1-927.3four79.066.04.081.93518.901.01.04.01.0NaNFiatX1-9
Porsche 914-226.0four120.391.04.432.14016.700.01.05.02.0NaNPorsche914-2
Lotus Europa30.4four95.1113.03.771.51316.901.01.05.02.0120000.0LotusEuropa
Ford Pantera L15.8eight351.0264.04.223.17014.500.01.05.04.0NaNFordPantera L
Ferrari Dino19.7six145.0175.03.622.77015.500.01.05.06.0NaNFerrariDino
Maserati Bora15.0eight301.0335.03.543.57014.600.01.05.08.0NaNMaseratiBora
Volvo 142ENaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNVolvo142E
\n", "
" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am \\\n", "car \n", "Mazda RX4 21.0 six 160.0 110.0 3.90 2.620 16.46 0.0 1.0 \n", "Mazda RX4 Wag 21.0 six 160.0 110.0 3.90 2.875 17.02 0.0 1.0 \n", "Datsun 710 22.8 four 108.0 93.0 3.85 2.320 18.61 1.0 1.0 \n", "Hornet 4 Drive 21.4 six 258.0 110.0 3.08 3.215 19.44 1.0 0.0 \n", "Hornet Sportabout 18.7 eight 360.0 175.0 3.15 3.440 17.02 0.0 0.0 \n", "Valiant 18.1 six 225.0 105.0 2.76 3.460 20.22 1.0 0.0 \n", "Duster 360 14.3 eight 360.0 245.0 3.21 3.570 15.84 0.0 0.0 \n", "Merc 240D 24.4 four 146.7 62.0 3.69 3.190 20.00 1.0 0.0 \n", "Merc 230 22.8 four 140.8 95.0 3.92 3.150 22.90 1.0 0.0 \n", "Merc 280 19.2 six 167.6 123.0 3.92 3.440 18.30 1.0 0.0 \n", "Merc 280C 17.8 six 167.6 123.0 3.92 3.440 18.90 1.0 0.0 \n", "Merc 450SE 16.4 eight 275.8 180.0 3.07 4.070 17.40 0.0 0.0 \n", "Merc 450SL 17.3 eight 275.8 180.0 3.07 3.730 17.60 0.0 0.0 \n", "Merc 450SLC 15.2 eight 275.8 180.0 3.07 3.780 18.00 0.0 0.0 \n", "Cadillac Fleetwood 10.4 eight 472.0 205.0 2.93 5.250 17.98 0.0 0.0 \n", "Lincoln Continental 10.4 eight 460.0 215.0 3.00 5.424 17.82 0.0 0.0 \n", "Chrysler Imperial 14.7 eight 440.0 230.0 3.23 5.345 17.42 0.0 0.0 \n", "Fiat 128 32.4 four 78.7 66.0 4.08 2.200 19.47 1.0 1.0 \n", "Honda Civic 30.4 four 75.7 52.0 4.93 1.615 18.52 1.0 1.0 \n", "Toyota Corolla 33.9 four 71.1 65.0 4.22 1.835 19.90 1.0 1.0 \n", "Toyota Corona 21.5 four 120.1 97.0 3.70 2.465 20.01 1.0 0.0 \n", "Dodge Challenger 15.5 eight 318.0 150.0 2.76 3.520 16.87 0.0 0.0 \n", "AMC Javelin 15.2 eight 304.0 150.0 3.15 3.435 17.30 0.0 0.0 \n", "Camaro Z28 13.3 eight 350.0 245.0 3.73 3.840 15.41 0.0 0.0 \n", "Pontiac Firebird 19.2 eight 400.0 175.0 3.08 3.845 17.05 0.0 0.0 \n", "Fiat X1-9 27.3 four 79.0 66.0 4.08 1.935 18.90 1.0 1.0 \n", "Porsche 914-2 26.0 four 120.3 91.0 4.43 2.140 16.70 0.0 1.0 \n", "Lotus Europa 30.4 four 95.1 113.0 3.77 1.513 16.90 1.0 1.0 \n", "Ford Pantera L 15.8 eight 351.0 264.0 4.22 3.170 14.50 0.0 1.0 \n", "Ferrari Dino 19.7 six 145.0 175.0 3.62 2.770 15.50 0.0 1.0 \n", "Maserati Bora 15.0 eight 301.0 335.0 3.54 3.570 14.60 0.0 1.0 \n", "Volvo 142E NaN NaN NaN NaN NaN NaN NaN NaN NaN \n", "\n", " gear carb price brand model \n", "car \n", "Mazda RX4 4.0 4.0 100000.0 Mazda RX4 \n", "Mazda RX4 Wag 4.0 4.0 NaN Mazda RX4 Wag \n", "Datsun 710 4.0 1.0 NaN Datsun 710 \n", "Hornet 4 Drive 3.0 1.0 NaN Hornet 4 Drive \n", "Hornet Sportabout 3.0 2.0 NaN Hornet Sportabout \n", "Valiant 3.0 1.0 NaN Valiant \n", "Duster 360 3.0 4.0 NaN Duster 360 \n", "Merc 240D 4.0 2.0 NaN Merc 240D \n", "Merc 230 4.0 2.0 NaN Merc 230 \n", "Merc 280 4.0 4.0 NaN Merc 280 \n", "Merc 280C 4.0 4.0 NaN Merc 280C \n", "Merc 450SE 3.0 3.0 NaN Merc 450SE \n", "Merc 450SL 3.0 3.0 NaN Merc 450SL \n", "Merc 450SLC 3.0 3.0 NaN Merc 450SLC \n", "Cadillac Fleetwood 3.0 4.0 NaN Cadillac Fleetwood \n", "Lincoln Continental 3.0 4.0 NaN Lincoln Continental \n", "Chrysler Imperial 3.0 4.0 NaN Chrysler Imperial \n", "Fiat 128 4.0 1.0 NaN Fiat 128 \n", "Honda Civic 4.0 2.0 NaN Honda Civic \n", "Toyota Corolla 4.0 1.0 NaN Toyota Corolla \n", "Toyota Corona 3.0 1.0 NaN Toyota Corona \n", "Dodge Challenger 3.0 2.0 NaN Dodge Challenger \n", "AMC Javelin 3.0 2.0 NaN AMC Javelin \n", "Camaro Z28 3.0 4.0 NaN Camaro Z28 \n", "Pontiac Firebird 3.0 2.0 NaN Pontiac Firebird \n", "Fiat X1-9 4.0 1.0 NaN Fiat X1-9 \n", "Porsche 914-2 5.0 2.0 NaN Porsche 914-2 \n", "Lotus Europa 5.0 2.0 120000.0 Lotus Europa \n", "Ford Pantera L 5.0 4.0 NaN Ford Pantera L \n", "Ferrari Dino 5.0 6.0 NaN Ferrari Dino \n", "Maserati Bora 5.0 8.0 NaN Maserati Bora \n", "Volvo 142E NaN NaN NaN Volvo 142E " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n2str = {\n", " 4.0: \"four\",\n", " 6.0: \"six\",\n", " 8.0: \"eight\"\n", " }\n", "df[\"cyl\"] = df[\"cyl\"].map(n2str)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{warning}\n", "Использовать числа с плавающей точкой в качестве ключей словаря считается плохой практикой из-за особенностей их представления в компьютере: даже минимальное отклонение от точного значения ключа в словаре может привести (и почти наверняка приведет) к другому значению хэш-функции и получить искомое значение по \"испорченному ключу не выйдет\".\n", "```\n", "\n", "Метод [DataFrame.applymap](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap) применяет функцию к кажому значению в таблице." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Группировка по значениям столбца\n", "\n", "Часто необходимо сгруппировать строки таблицы по какому-то принципу и с каждой из групп проделать какие-то операции. За группировку в `pandas` отвечает метод [DataFrame.groupby](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html). В качестве аргумента её можно передать имя столбца, по значениям которого необходима группировка (можно также передать несколько столбцов в списке)." ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcyldisphpdratwtqsecvsamgearcarbpricebrandmodel
car
Mazda RX421.06.0160.0110.03.92.62016.460.01.04.04.0100000.0MazdaRX4
Mazda RX4 Wag21.06.0160.0110.03.92.87517.020.01.04.04.0NaNMazdaRX4 Wag
\n", "
" ], "text/plain": [ " mpg cyl disp hp drat wt qsec vs am gear \\\n", "car \n", "Mazda RX4 21.0 6.0 160.0 110.0 3.9 2.620 16.46 0.0 1.0 4.0 \n", "Mazda RX4 Wag 21.0 6.0 160.0 110.0 3.9 2.875 17.02 0.0 1.0 4.0 \n", "\n", " carb price brand model \n", "car \n", "Mazda RX4 4.0 100000.0 Mazda RX4 \n", "Mazda RX4 Wag 4.0 NaN Mazda RX4 Wag " ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped = df.groupby(\"brand\")\n", "grouped.get_group(\"Mazda\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Каждую группу можно рассматривать как мини-таблицу. Часто таблицу группируют по значениям одного столбца, чтобы потом посчитать какие-нибудь агрегирующие функцию к значениям других столбцов. Сделать это можно методом [DataFrameGroupBy.aggregate](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html). " ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
minmeanmaxcount
brand
AMC15.215.20000015.21
Cadillac10.410.40000010.41
Camaro13.313.30000013.31
Chrysler14.714.70000014.71
Datsun22.822.80000022.81
Dodge15.515.50000015.51
Duster14.314.30000014.31
Ferrari19.719.70000019.71
Fiat27.329.85000032.42
Ford15.815.80000015.81
Honda30.430.40000030.41
Hornet18.720.05000021.42
Lincoln10.410.40000010.41
Lotus30.430.40000030.41
Maserati15.015.00000015.01
Mazda21.021.00000021.02
Merc15.219.01428624.47
Pontiac19.219.20000019.21
Porsche26.026.00000026.01
Toyota21.527.70000033.92
Valiant18.118.10000018.11
VolvoNaNNaNNaN0
\n", "
" ], "text/plain": [ " min mean max count\n", "brand \n", "AMC 15.2 15.200000 15.2 1\n", "Cadillac 10.4 10.400000 10.4 1\n", "Camaro 13.3 13.300000 13.3 1\n", "Chrysler 14.7 14.700000 14.7 1\n", "Datsun 22.8 22.800000 22.8 1\n", "Dodge 15.5 15.500000 15.5 1\n", "Duster 14.3 14.300000 14.3 1\n", "Ferrari 19.7 19.700000 19.7 1\n", "Fiat 27.3 29.850000 32.4 2\n", "Ford 15.8 15.800000 15.8 1\n", "Honda 30.4 30.400000 30.4 1\n", "Hornet 18.7 20.050000 21.4 2\n", "Lincoln 10.4 10.400000 10.4 1\n", "Lotus 30.4 30.400000 30.4 1\n", "Maserati 15.0 15.000000 15.0 1\n", "Mazda 21.0 21.000000 21.0 2\n", "Merc 15.2 19.014286 24.4 7\n", "Pontiac 19.2 19.200000 19.2 1\n", "Porsche 26.0 26.000000 26.0 1\n", "Toyota 21.5 27.700000 33.9 2\n", "Valiant 18.1 18.100000 18.1 1\n", "Volvo NaN NaN NaN 0" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped[\"mpg\"].aggregate([\"min\", \"mean\", \"max\", \"count\"])" ] } ], "metadata": { "interpreter": { "hash": "cd49b4596bae9c980ff74fdf93e8fe80e447435ae307c062fad6c4f9ef2eb47f" }, "kernelspec": { "display_name": "Python 3.8.10 64-bit ('venv': venv)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }