b. Find output based on the following Series code:
import pandas as pd import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103]) print(s)
Answers
Answer:
We’ll start with a quick, non-comprehensive overview of the fundamental data structures in pandas to get you started. The fundamental behavior about data types, indexing, and axis labeling / alignment apply across all of the objects. To get started, import NumPy and load pandas into your namespace:
In [1]: import numpy as np
In [2]: import pandas as pd
Here is a basic tenet to keep in mind: data alignment is intrinsic. The link between labels and data will not be broken unless done so explicitly by you.
We’ll give a brief intro to the data structures, then consider all of the broad categories of functionality and methods in separate sections.
Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
>>> s = pd.Series(data, index=index)
Here, data can be many different things:
a Python dict
an ndarray
a scalar value (like 5)
The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is:
From ndarray
If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].
In [3]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
In [4]: s
Out[4]:
a 0.469112
b -0.282863
c -1.509059
d -1.135632
e 1.212112
dtype: float64
In [5]: s.index
Out[5]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
In [6]: pd.Series(np.random.randn(5))
Out[6]:
0 -0.173215
1 0.119209
2 -1.044236
3 -0.861849
4 -2.104569
dtype: float64
Note
pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time. The reason for being lazy is nearly all performance-based (there are many instances in computations, like parts of GroupBy, where the index is not used).
From dict
Series can be instantiated from dicts:
In [7]: d = {'b': 1, 'a': 0, 'c': 2}
In [8]: pd.Series(d)
Out[8]:
b 1
a 0
c 2
dtype: int64
Note
When the data is a dict, and an index is not passed, the Series index will be ordered by the dict’s insertion order, if you’re using Python version >= 3.6 and Pandas version >= 0.23.
If you’re using Python < 3.6 or Pandas < 0.23, and an index is not passed, the Series index will be the lexically ordered list of dict keys.
In the example above, if you were on a Python version lower than 3.6 or a Pandas version lower than 0.23, the Series would be ordered by the lexical order of the dict keys (i.e. ['a', 'b', 'c'] rather than ['b', 'a', 'c']).
If an index is passed, the values in data corresponding to the labels in the index will be pulled out.
In [9]: d = {'a': 0., 'b': 1., 'c': 2.}
In [10]: pd.Series(d)
Out[10]:
a 0.0
b 1.0
c 2.0
dtype: float64
In [11]: pd.Series(d, index=['b', 'c', 'd', 'a'])
Out[11]:
b 1.0
c 2.0
d NaN
a 0.0
dtype: float64
Note
NaN (not a number) is the standard missing data marker used in pandas.
From scalar value
If data is a scalar value, an index must be provided. The value will be repeated to match the length of index.
In [12]: pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
Out[12]:
a 5.0
b 5.0
c 5.0
d 5.0
e 5.0
dtype: float64
Series is ndarray-like
Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.
In [13]: s[0]
Out[13]: 0.4691122999071863
In [14]: s[:3]
Out[14]:
a 0.469112
b -0.282863
c -1.509059
dtype: float64
In [15]: s[s > s.median()]
Out[15]:
a 0.469112
e 1.212112
dtype: float64
In [16]: s[[4, 3, 1]]
Out[16]:
e 1.212112
d -1.135632
b -0.282863
dtype: float64
In [17]: np.exp(s)
Out[17]:
a 1.598575
b 0.753623
c 0.221118
d 0.321219
e 3.360575
dtype: float64
make me brain.......
and follow me please