LAB 1C - Pandas

Why pandas?

NumPy is great. But it lacks a few things that are conducive to doing statisitcal analysis. By building on top of NumPy, pandas provides

  • labeled arrays
  • heterogenous data types within a table
  • "better" missing data handling
  • convenient methods (groupby, rolling, resample)
  • more data types (Categorical, Datetime)

Data Structures

This is the typical starting point for any intro to pandas. We'll follow suit.

The DataFrame

Here we have the workhorse data structure for pandas. It's an in-memory table holding your data, and provides a few conviniences over lists of lists or NumPy arrays.

In [2]:
import numpy as np
import pandas as pd
In [32]:
# Many ways to construct a DataFrame
# We pass a dict of {column name: column values}
np.random.seed(42)
df = pd.DataFrame({'A': [1, 2, 3], 
                   'B': [True, True, False],
                   'C': np.random.randn(3)},
                  index=['a', 'b', 'c'])  # also this weird index thing
df
Out[32]:
A B C
a 1 True 0.496714
b 2 True -0.138264
c 3 False 0.647689

Notice that we can store a column of intergers, a column of booleans, and a column of floats in the same DataFrame.

Indexing

Our first improvement over numpy arrays is labeled indexing. We can select subsets by column, row, or both. Column selection uses the regular python __getitem__ machinery. Pass in a single column label 'A' or a list of labels ['A', 'C'] to select subsets of the original DataFrame.

In [3]:
# Single column, reduces to a Series
df['A']
Out[3]:
a    1
b    2
c    3
Name: A, dtype: int64
In [4]:
cols = ['A', 'C']
df[cols]
Out[4]:
A C
a 1 0.496714
b 2 -0.138264
c 3 0.647689

For row-wise selection, use the special .loc accessor.

In [5]:
df.loc[['a', 'b']]
Out[5]:
A B C
a 1 True 0.496714
b 2 True -0.138264

You can use ranges to select rows or columns.

In [6]:
df.loc['a':'b']
Out[6]:
A B C
a 1 True 0.496714
b 2 True -0.138264

Notice that the slice is inclusive on both sides, unlike your typical slicing of a list. Sometimes, you'd rather slice by position instead of label. .iloc has you covered:

In [16]:
df.iloc[[0, 2]]
Out[16]:
A B C
a 1 True 0.496714
c 3 False 0.647689
In [21]:
df.iloc[:2]
Out[21]:
A B C
a 1 True 0.496714
b 2 True -0.138264

This follows the usual python slicing rules: closed on the left, open on the right.

As I mentioned, you can slice both rows and columns. Use .loc for label or .iloc for position indexing.

In [19]:
df.loc['a', 'B'], df.iloc[0, 1]
Out[19]:
(True, True)

Pandas, like NumPy, will reduce dimensions when possible. Select a single column and you get back Series (see below). Select a single row and single column, you get a scalar.

You can get pretty fancy:

In [22]:
df.loc['a':'b', ['A', 'C']]
Out[22]:
A C
a 1 0.496714
b 2 -0.138264

Summary

  • Use [] for selecting columns
  • Use .loc[row_lables, column_labels] for label-based indexing
  • Use .iloc[row_positions, column_positions] for positional index

I've left out boolean and hierarchical indexing, which we'll see later.

Series

You've already seen some Series up above. It's the 1-dimensional analog of the DataFrame. Each column in a DataFrame is in some sense a Series. You can select a Series from a DataFrame in a few ways:

In [11]:
# __getitem__ like before
df['A']
Out[11]:
a    1
b    2
c    3
Name: A, dtype: int64
In [12]:
# .loc, like before
df.loc[:, 'A']
Out[12]:
a    1
b    2
c    3
Name: A, dtype: int64
In [25]:
# using `.` attribute lookup
df.A
Out[25]:
a    1
b    2
c    3
Name: A, dtype: int64
In [26]:
df['mean'] = ['a', 'b', 'c']
In [31]:
df['mean']
Out[31]:
a    a
b    b
c    c
Name: mean, dtype: object
In [39]:
df.mean
Out[39]:
<bound method DataFrame.mean of    A      B         C
a  1   True  0.496714
b  2   True -0.138264
c  3  False  0.647689>

You'll have to be careful with the last one. It won't work if you're column name isn't a valid python identifier (say it has a space) or if it conflicts with one of the (many) methods on DataFrame. The . accessor is extremely convient for interactive use though.

You should never assign a column with . e.g. don't do

# bad
df.A = [1, 2, 3]

It's unclear whether your attaching the list [1, 2, 3] as an attribute of df, or whether you want it as a column. It's better to just say

df['A'] = [1, 2, 3]
# or
df.loc[:, 'A'] = [1, 2, 3]

Series share many of the same methods as DataFrames.

Index

Indexes are something of a peculiarity to pandas. First off, they are not the kind of indexes you'll find in SQL, which are used to help the engine speed up certain queries. In pandas, Indexes are about lables. This helps with selection (like we did above) and automatic alignment when performing operations between two DataFrames or Series.

R does have row labels, but they're nowhere near as powerful (or complicated) as in pandas. You can access the index of a DataFrame or Series with the .index attribute.

In [40]:
df.index
Out[40]:
Index(['a', 'b', 'c'], dtype='object')
In [41]:
df.columns
Out[41]:
Index(['A', 'B', 'C'], dtype='object')

Operations

In [57]:
np.random.seed(42)
df = pd.DataFrame(np.random.uniform(0, 100, size=(3, 3)))
# df = pd.DataFrame(np.random.randn(3, 3))
# df = pd.DataFrame(np.random.random([3, 3]))
df
Out[57]:
0 1 2
0 37.454012 95.071431 73.199394
1 59.865848 15.601864 15.599452
2 5.808361 86.617615 60.111501
In [58]:
df + 1
Out[58]:
0 1 2
0 38.454012 96.071431 74.199394
1 60.865848 16.601864 16.599452
2 6.808361 87.617615 61.111501
In [59]:
df ** 2
Out[59]:
0 1 2
0 1402.803006 9038.576924 5358.151308
1 3583.919807 243.418162 243.342904
2 33.737060 7502.611155 3613.392573
In [60]:
np.log(df)
Out[60]:
0 1 2
0 3.623114 4.554629 4.293187
1 4.092106 2.747390 2.747236
2 1.759298 4.461503 4.096201

DataFrames and Series have a bunch of useful aggregation methods, .mean, .max, .std, etc.

In [61]:
df.mean()
Out[61]:
0    34.376074
1    65.763636
2    49.636782
dtype: float64

Loading Data

In [66]:
df = pd.read_csv('beer_subset.csv.gz', parse_dates=['time'], compression='gzip')
review_cols = ['review_appearance', 'review_aroma', 'review_overall',
               'review_palate', 'review_taste']
df.head()
Out[66]:
abv beer_id brewer_id beer_name beer_style review_appearance review_aroma review_overall review_palate profile_name review_taste text time
0 7.0 2511 287 Bell's Cherry Stout American Stout 4.5 4.0 4.5 4.0 blaheath 4.5 Batch 8144\tPitch black in color with a 1/2 f... 2009-10-05 21:31:48
1 5.7 19736 9790 Duck-Rabbit Porter American Porter 4.5 4.0 4.5 4.0 GJ40 4.0 Sampled from a 12oz bottle in a standard pint... 2009-10-05 21:32:09
2 4.8 11098 3182 Fürstenberg Premium Pilsener German Pilsener 4.0 3.0 3.0 3.0 biegaman 3.5 Haystack yellow with an energetic group of bu... 2009-10-05 21:32:13
3 9.5 28577 3818 Unearthly (Imperial India Pale Ale) American Double / Imperial IPA 4.0 4.0 4.0 4.0 nick76 4.0 The aroma has pine, wood, citrus, caramel, an... 2009-10-05 21:32:37
4 5.8 398 119 Wolaver's Pale Ale American Pale Ale (APA) 4.0 3.0 4.0 3.5 champ103 3.0 A: Pours a slightly hazy golden/orange color.... 2009-10-05 21:33:14

Boolean indexing

Like a where clause in SQL. The indexer (or boolean mask) should be 1-dimensional and the same length as the thing being indexed.

In [67]:
df.abv < 5
Out[67]:
0      False
1      False
2       True
3      False
4      False
       ...  
994    False
995    False
996    False
997    False
998    False
Name: abv, Length: 999, dtype: bool
In [68]:
df[df.abv < 5].head()
Out[68]:
abv beer_id brewer_id beer_name beer_style review_appearance review_aroma review_overall review_palate profile_name review_taste text time
2 4.8 11098 3182 Fürstenberg Premium Pilsener German Pilsener 4.0 3.0 3.0 3.0 biegaman 3.5 Haystack yellow with an energetic group of bu... 2009-10-05 21:32:13
7 4.8 1669 256 Great White Witbier 4.5 4.5 4.5 4.5 n0rc41 4.5 Ok, for starters great white I believe will b... 2009-10-05 21:34:29
21 4.6 401 118 Dark Island Scottish Ale 4.0 4.0 3.5 4.0 abuliarose 4.0 Poured into a snifter, revealing black opaque... 2009-10-05 21:47:36
22 4.9 5044 18968 Kipona Fest Märzen / Oktoberfest 4.0 3.5 4.0 4.0 drcarver 4.0 A - a medium brown body with an off white hea... 2009-10-05 21:47:56
28 4.6 401 118 Dark Island Scottish Ale 4.0 4.0 4.5 4.0 sisuspeed 4.0 The color of this beer fits the name well. Op... 2009-10-05 21:53:38

Notice that we just used [] there. We can pass the boolean indexer in to .loc as well.

In [69]:
df.loc[df.abv < 5, ['beer_style', 'review_overall']].head()
Out[69]:
beer_style review_overall
2 German Pilsener 3.0
7 Witbier 4.5
21 Scottish Ale 3.5
22 Märzen / Oktoberfest 4.0
28 Scottish Ale 4.5

Again, you can get complicated

In [72]:
df[((df.abv < 5) & (df.time > pd.Timestamp('2009-06'))) | (df.review_overall >= 4.5)]
Out[72]:
abv beer_id brewer_id beer_name beer_style review_appearance review_aroma review_overall review_palate profile_name review_taste text time
0 7.0 2511 287 Bell's Cherry Stout American Stout 4.5 4.0 4.5 4.0 blaheath 4.5 Batch 8144\tPitch black in color with a 1/2 f... 2009-10-05 21:31:48
1 5.7 19736 9790 Duck-Rabbit Porter American Porter 4.5 4.0 4.5 4.0 GJ40 4.0 Sampled from a 12oz bottle in a standard pint... 2009-10-05 21:32:09
2 4.8 11098 3182 Fürstenberg Premium Pilsener German Pilsener 4.0 3.0 3.0 3.0 biegaman 3.5 Haystack yellow with an energetic group of bu... 2009-10-05 21:32:13
6 6.2 53128 1114 Smokin' Amber Kegs Gone Wild American Amber / Red Ale 3.5 4.0 4.5 4.0 Deuane 4.5 An American amber with the addition of smoked... 2009-10-05 21:34:24
7 4.8 1669 256 Great White Witbier 4.5 4.5 4.5 4.5 n0rc41 4.5 Ok, for starters great white I believe will b... 2009-10-05 21:34:29
... ... ... ... ... ... ... ... ... ... ... ... ... ...
987 7.2 39296 14400 Oatis Oatmeal Stout 4.0 4.0 4.5 4.5 GJ40 4.0 Sampled from a 22oz bottle purchased at Pike ... 2009-10-07 01:50:50
989 7.0 782 113 Samuel Smith's Imperial Stout Russian Imperial Stout 5.0 4.0 4.5 4.0 SamN 3.0 Bomber purchased from Campus West Liquors and... 2009-10-07 01:54:05
992 5.7 46767 8 Drifter Pale Ale American Pale Ale (APA) 3.5 4.0 4.5 4.0 thespaceman 4.0 Had on tap at Smokin' With Chris in Southingt... 2009-10-07 01:57:03
993 10.0 36728 18149 10 Commandments Belgian Strong Dark Ale 4.0 4.0 4.5 4.0 ClockworkOrange 4.0 This bottle has been in the cellar for at lea... 2009-10-07 01:57:56
995 6.5 49728 2874 St. Feuillien Saison Saison / Farmhouse Ale 4.0 3.5 5.0 4.0 Kraken 4.5 Reviewed 10/6/09\t\tPoured from a corked and ... 2009-10-07 01:58:45

353 rows × 13 columns

Exercise: Find the American beers

Select just the rows where the beer_style contains 'American'.

Hint: Series containing strings have a bunch of useful methods under the DataFrame.<column>.str namespace. Typically they correspond to regular python string methods, but

  • They gracefully propogate missing values
  • They're a bit more liberal about accepting regular expressions

We can't use 'American' in df['beer_style'], since in is used to check membership in the series itself, not the strings. But in uses __contains__, so look for a string method like that.

In [73]:
df.beer_style.str.contains("American")
Out[73]:
0       True
1       True
2      False
3       True
4       True
       ...  
994     True
995    False
996     True
997    False
998     True
Name: beer_style, Length: 999, dtype: bool
In [74]:
# Your solution
is_ipa = df.beer_style.str.contains("American")
df[is_ipa]
Out[74]:
abv beer_id brewer_id beer_name beer_style review_appearance review_aroma review_overall review_palate profile_name review_taste text time
0 7.0 2511 287 Bell's Cherry Stout American Stout 4.5 4.0 4.5 4.0 blaheath 4.5 Batch 8144\tPitch black in color with a 1/2 f... 2009-10-05 21:31:48
1 5.7 19736 9790 Duck-Rabbit Porter American Porter 4.5 4.0 4.5 4.0 GJ40 4.0 Sampled from a 12oz bottle in a standard pint... 2009-10-05 21:32:09
3 9.5 28577 3818 Unearthly (Imperial India Pale Ale) American Double / Imperial IPA 4.0 4.0 4.0 4.0 nick76 4.0 The aroma has pine, wood, citrus, caramel, an... 2009-10-05 21:32:37
4 5.8 398 119 Wolaver's Pale Ale American Pale Ale (APA) 4.0 3.0 4.0 3.5 champ103 3.0 A: Pours a slightly hazy golden/orange color.... 2009-10-05 21:33:14
5 7.0 966 365 Pike Street XXXXX Stout American Stout 4.0 4.0 3.5 4.0 sprucetip 4.5 From notes. Pours black, thin mocha head fade... 2009-10-05 21:33:48
... ... ... ... ... ... ... ... ... ... ... ... ... ...
988 5.4 61547 23058 Hoppy Dog American Pale Ale (APA) 4.0 2.0 2.0 3.0 ClockworkOrange 2.0 A very nice looking cobalt blue 1L swing top.... 2009-10-07 01:52:06
992 5.7 46767 8 Drifter Pale Ale American Pale Ale (APA) 3.5 4.0 4.5 4.0 thespaceman 4.0 Had on tap at Smokin' With Chris in Southingt... 2009-10-07 01:57:03
994 6.5 184 141 Smuttynose Old Brown Dog Ale American Brown Ale 4.5 4.0 3.5 4.0 Jayli 4.0 consumed 10/2/09\tThis beer poured a nice cle... 2009-10-07 01:58:20
996 9.8 21166 156 Imperial Nut Brown Ale American Brown Ale 3.5 4.0 2.5 3.5 natelocc787 4.0 This beer contained sediment. Maple and brown... 2009-10-07 02:02:23
998 10.4 28578 3818 Jahva (Imperial Coffee Stout) American Double / Imperial Stout 4.0 4.0 3.0 4.0 ritzkiss 3.5 22oz bottle from Bobsy a while ago, finally c... 2009-10-07 02:02:37

426 rows × 13 columns

Groupby

Groupby is a fundamental operation to pandas and data analysis.

The components of a groupby operation are to

  1. Split a table into groups
  2. Apply a function to each group
  3. Combine the results

In pandas the first step looks like

df.groupby( grouper )

grouper can be many things

  • Series (or string indicating a column in df)
  • function (to be applied on the index)
  • dict : groups by values
  • levels=[ names of levels in a MultiIndex ]
In [76]:
gr = df.groupby('beer_style')
gr
Out[76]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fd1e7ec3250>

Haven't really done anything yet. Just some book-keeping to figure out which keys go with which rows. Keys are the things we've grouped by (each beer_style in this case).

The last two steps, apply and combine, are just:

In [77]:
gr.agg('mean')
Out[77]:
abv beer_id brewer_id review_appearance review_aroma review_overall review_palate review_taste
beer_style
Altbier 5.850000 43260.500000 419.500000 4.000000 3.750000 4.000000 3.750000 4.000000
American Adjunct Lager 4.872727 12829.909091 2585.909091 2.954545 2.613636 3.272727 2.909091 2.750000
American Amber / Red Ale 6.195652 28366.777778 2531.111111 3.740741 3.592593 3.870370 3.555556 3.777778
American Amber / Red Lager 4.822857 22277.500000 5620.125000 3.437500 3.312500 3.375000 3.187500 3.125000
American Barleywine 10.208333 32457.250000 3744.083333 3.958333 3.937500 3.729167 3.895833 3.937500
... ... ... ... ... ... ... ... ...
Tripel 9.329412 16027.705882 2882.882353 4.264706 4.088235 3.970588 3.911765 4.176471
Vienna Lager 4.985714 19497.750000 6180.750000 3.500000 3.250000 3.375000 3.562500 3.312500
Weizenbock 8.350000 19540.500000 250.000000 4.000000 3.750000 4.250000 4.250000 4.250000
Wheatwine 11.075000 36980.000000 629.600000 3.800000 4.000000 3.500000 4.000000 3.700000
Witbier 6.175000 27346.600000 3583.700000 3.750000 3.650000 3.600000 3.550000 3.650000

93 rows × 8 columns

In [33]:
df.groupby('beer_style').mean()
Out[33]:
abv beer_id brewer_id review_appearance review_aroma review_overall review_palate review_taste
beer_style
Altbier 5.850000 43260.500000 419.500000 4.000000 3.750000 4.000000 3.750000 4.000000
American Adjunct Lager 4.872727 12829.909091 2585.909091 2.954545 2.613636 3.272727 2.909091 2.750000
American Amber / Red Ale 6.195652 28366.777778 2531.111111 3.740741 3.592593 3.870370 3.555556 3.777778
American Amber / Red Lager 4.822857 22277.500000 5620.125000 3.437500 3.312500 3.375000 3.187500 3.125000
American Barleywine 10.208333 32457.250000 3744.083333 3.958333 3.937500 3.729167 3.895833 3.937500
... ... ... ... ... ... ... ... ...
Tripel 9.329412 16027.705882 2882.882353 4.264706 4.088235 3.970588 3.911765 4.176471
Vienna Lager 4.985714 19497.750000 6180.750000 3.500000 3.250000 3.375000 3.562500 3.312500
Weizenbock 8.350000 19540.500000 250.000000 4.000000 3.750000 4.250000 4.250000 4.250000
Wheatwine 11.075000 36980.000000 629.600000 3.800000 4.000000 3.500000 4.000000 3.700000
Witbier 6.175000 27346.600000 3583.700000 3.750000 3.650000 3.600000 3.550000 3.650000

93 rows × 8 columns

This says apply the mean function to each column. Non-numeric columns (nusiance columns) are excluded. We can also select a subset of columns to perform the aggregation on.

In [34]:
gr[review_cols].agg('mean')
Out[34]:
review_appearance review_aroma review_overall review_palate review_taste
beer_style
Altbier 4.000000 3.750000 4.000000 3.750000 4.000000
American Adjunct Lager 2.954545 2.613636 3.272727 2.909091 2.750000
American Amber / Red Ale 3.740741 3.592593 3.870370 3.555556 3.777778
American Amber / Red Lager 3.437500 3.312500 3.375000 3.187500 3.125000
American Barleywine 3.958333 3.937500 3.729167 3.895833 3.937500
... ... ... ... ... ...
Tripel 4.264706 4.088235 3.970588 3.911765 4.176471
Vienna Lager 3.500000 3.250000 3.375000 3.562500 3.312500
Weizenbock 4.000000 3.750000 4.250000 4.250000 4.250000
Wheatwine 3.800000 4.000000 3.500000 4.000000 3.700000
Witbier 3.750000 3.650000 3.600000 3.550000 3.650000

93 rows × 5 columns

. attribute lookup works as well.

In [35]:
gr.abv.agg('mean')
Out[35]:
beer_style
Altbier                        5.850000
American Adjunct Lager         4.872727
American Amber / Red Ale       6.195652
American Amber / Red Lager     4.822857
American Barleywine           10.208333
                                ...    
Tripel                         9.329412
Vienna Lager                   4.985714
Weizenbock                     8.350000
Wheatwine                     11.075000
Witbier                        6.175000
Name: abv, Length: 93, dtype: float64

Certain operations are attached directly to the GroupBy object, letting you bypass the .agg part

In [36]:
gr.abv.mean()
Out[36]:
beer_style
Altbier                        5.850000
American Adjunct Lager         4.872727
American Amber / Red Ale       6.195652
American Amber / Red Lager     4.822857
American Barleywine           10.208333
                                ...    
Tripel                         9.329412
Vienna Lager                   4.985714
Weizenbock                     8.350000
Wheatwine                     11.075000
Witbier                        6.175000
Name: abv, Length: 93, dtype: float64

Now we'll run the gamut on a bunch of grouper / apply combinations. Keep sight of the target though: split, apply, combine.

  • Grouper: Controls the output index
    • single grouper -> Index
    • array-like grouper -> MultiIndex
  • Subject (Groupee): Controls the output data values
    • single column -> Series (or DataFrame if multiple aggregations)
    • multiple columns -> DataFrame
  • Aggregation: Controls the output columns
    • single aggfunc -> Index in the colums
    • multiple aggfuncs -> MultiIndex in the columns (Or 1-D Index groupee is 1-d)

Multiple Aggregations on one column

In [37]:
gr['review_aroma'].agg(['mean', 'std', 'count']).head()
Out[37]:
mean std count
beer_style
Altbier 3.750000 0.353553 2
American Adjunct Lager 2.613636 0.596255 22
American Amber / Red Ale 3.592593 0.636049 27
American Amber / Red Lager 3.312500 0.842509 8
American Barleywine 3.937500 0.449940 24

Single Aggregation on multiple columns

In [38]:
gr[review_cols].mean()
Out[38]:
review_appearance review_aroma review_overall review_palate review_taste
beer_style
Altbier 4.000000 3.750000 4.000000 3.750000 4.000000
American Adjunct Lager 2.954545 2.613636 3.272727 2.909091 2.750000
American Amber / Red Ale 3.740741 3.592593 3.870370 3.555556 3.777778
American Amber / Red Lager 3.437500 3.312500 3.375000 3.187500 3.125000
American Barleywine 3.958333 3.937500 3.729167 3.895833 3.937500
... ... ... ... ... ...
Tripel 4.264706 4.088235 3.970588 3.911765 4.176471
Vienna Lager 3.500000 3.250000 3.375000 3.562500 3.312500
Weizenbock 4.000000 3.750000 4.250000 4.250000 4.250000
Wheatwine 3.800000 4.000000 3.500000 4.000000 3.700000
Witbier 3.750000 3.650000 3.600000 3.550000 3.650000

93 rows × 5 columns

Multiple aggregations on multiple columns

In [39]:
gr[review_cols].agg(['mean', 'count', 'std'])
Out[39]:
review_appearance review_aroma review_overall review_palate review_taste
mean count std mean count std mean count std mean count std mean count std
beer_style
Altbier 4.000000 2 0.707107 3.750000 2 0.353553 4.000000 2 0.000000 3.750000 2 0.353553 4.000000 2 0.000000
American Adjunct Lager 2.954545 22 0.722250 2.613636 22 0.596255 3.272727 22 0.667748 2.909091 22 0.478996 2.750000 22 0.631514
American Amber / Red Ale 3.740741 27 0.625890 3.592593 27 0.636049 3.870370 27 0.629294 3.555556 27 0.640513 3.777778 27 0.763763
American Amber / Red Lager 3.437500 8 0.417261 3.312500 8 0.842509 3.375000 8 1.187735 3.187500 8 0.961305 3.125000 8 1.125992
American Barleywine 3.958333 24 0.529903 3.937500 24 0.449940 3.729167 24 0.465766 3.895833 24 0.389514 3.937500 24 0.517362
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Tripel 4.264706 17 0.358715 4.088235 17 0.363803 3.970588 17 0.329326 3.911765 17 0.317967 4.176471 17 0.350944
Vienna Lager 3.500000 8 0.377964 3.250000 8 0.534522 3.375000 8 0.517549 3.562500 8 0.678101 3.312500 8 0.458063
Weizenbock 4.000000 2 0.000000 3.750000 2 0.353553 4.250000 2 0.353553 4.250000 2 0.353553 4.250000 2 0.353553
Wheatwine 3.800000 5 0.273861 4.000000 5 0.353553 3.500000 5 0.353553 4.000000 5 0.000000 3.700000 5 0.447214
Witbier 3.750000 10 0.540062 3.650000 10 0.625833 3.600000 10 0.658281 3.550000 10 0.761942 3.650000 10 0.529675

93 rows × 15 columns

In [ ]:
 
In [ ]:
 
In [ ]: