系列的真值含糊不清。使用a.empty,a.bool(),a.item(),a.any()或a.all()
系列的真值含糊不清。 使用a.empty,a.bool(),a.item(),a.any()或a.all()
本文翻译自:Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Having issue filtering my result dataframe with an or
condition. 在使用or
条件过滤我的结果数据帧时出现问题。 I want my result df
to extract all column var
values that are above 0.25 and below -0.25. 我希望我的结果df
提取大于0.25且小于-0.25的所有列var
值。
This logic below gives me an ambiguous truth value however it work when I split this filtering in two separate operations. 下面的逻辑为我提供了一个模糊的真实值,但是当我将此过滤分为两个独立的操作时,它可以工作。 What is happening here? 这是怎么回事 not sure where to use the suggested a.empty(), a.bool(), a.item(),a.any() or a.all()
. 不知道在哪里使用建议的a.empty(), a.bool(), a.item(),a.any() or a.all()
。
result = result[(result['var']>0.25) or (result['var']<-0.25)]
#1楼
参考:
#2楼
The or
and and
python statements require truth
-values. or
和and
python语句需要truth
。 For pandas
these are considered ambiguous so you should use "bitwise" |
对于pandas
它们被认为是模棱两可的,因此应使用“按位” |
(or) or &
(and) operations: (或)或&
(和)操作:
result = result[(result['var']>0.25) | (result['var']<-0.25)]
These are overloaded for these kind of datastructures to yield the element-wise or
(or and
). 对于此类数据结构,它们会重载以产生按元素or
(或and
)。
Just to add some more explanation to this statement: 只是为该语句添加更多解释:
The exception is thrown when you want to get the bool
of a pandas.Series
: 当您想获取pandas.Series
的bool
,抛出该异常:
>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What you hit was a place where the operator implicitly converted the operands to bool
(you used or
but it also happens for and
, if
and while
): 您所击中的是一个运算符将操作数隐式转换为bool
(您使用or
但也发生在and
, if
和while
):
>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Besides these 4 statements there are several python functions that hide some bool
calls (like any
, all
, filter
, ...) these are normally not problematic with pandas.Series
but for completeness I wanted to mention these. 除了这4条语句外,还有一些python函数可以隐藏一些bool
调用(如any
, all
, filter
,...),这些通常对于pandas.Series
不是问题,但是为了完整pandas.Series
,我想提及这些。
In your case the exception isn't really helpful, because it doesn't mention the right alternatives . 在您的情况下,该异常并没有真正的帮助,因为它没有提到正确的替代方法 。 For and
and or
you can use (if you want element-wise comparisons): 对于and
和or
可以使用(如果要逐元素比较):
numpy.logical_or
:numpy.logical_or
:>>> import numpy as np >>> np.logical_or(x, y)
or simply the
|
或只是|
operator: 操作员:>>> x | y
numpy.logical_and
:numpy.logical_and
:>>> np.logical_and(x, y)
or simply the
&
operator: 或者只是&
运算符:>>> x & y
If you're using the operators then make sure you set your parenthesis correctly because of the operator precedence . 如果您使用的是运算符,请确保由于运算符优先级而正确设置了括号。
There are several logical numpy functions which should work on pandas.Series
. 在pandas.Series
上应该有几个逻辑上的numpy函数 。
The alternatives mentioned in the Exception are more suited if you encountered it when doing if
or while
. 如果您在进行if
或while
时遇到它,则Exception中提到的替代方法更适合。 I'll shortly explain each of these: 我将在下面简短地解释每个:
If you want to check if your Series is empty : 如果要检查您的系列是否为空 :
>>> x = pd.Series([]) >>> x.empty True >>> x = pd.Series([1]) >>> x.empty False
Python normally interprets the
len
gth of containers (likelist
,tuple
, ...) as truth-value if it has no explicit boolean interpretation. 通常的Python解释len
容器的GTH(如list
,tuple
,...)作为真值,如果它没有明显的布尔解释。 So if you want the python-like check, you could do:if x.size
orif not x.empty
instead ofif x
. 因此,如果要进行类似python的检查,则可以执行以下操作:if x.size
或if not x.empty
而不是if x
。If your
Series
contains one and only one boolean value: 如果您的Series
包含一个且只有一个布尔值:>>> x = pd.Series([100]) >>> (x > 50).bool() True >>> (x < 50).bool() False
If you want to check the first and only item of your Series (like
.bool()
but works even for not boolean contents): 如果要检查系列的第一个也是唯一的项目 (如.bool()
但即使不是布尔型内容也可以使用):>>> x = pd.Series([100]) >>> x.item() 100
If you want to check if all or any item is not-zero, not-empty or not-False: 如果要检查所有或任何项目是否为非零,非空或非假:
>>> x = pd.Series([0, 1, 2]) >>> x.all() # because one element is zero False >>> x.any() # because one (or more) elements are non-zero True
#3楼
For boolean logic, use &
and |
对于布尔逻辑,请使用&
和|
. 。
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))>>> dfA B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863>>> df.loc[(df.C > 0.25) | (df.C < -0.25)]A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
To see what is happening, you get a column of booleans for each comparison, eg 要查看发生了什么,您可以为每个比较获得一列布尔值,例如
df.C > 0.25
0 True
1 False
2 False
3 True
4 True
Name: C, dtype: bool
When you have multiple criteria, you will get multiple columns returned. 当您有多个条件时,将返回多个列。 This is why the the join logic is ambiguous. 这就是为什么联接逻辑模棱两可的原因。 Using and
or or
treats each column separately, so you first need to reduce that column to a single boolean value. 分别使用and
或or
对待每列,因此您首先需要将该列减少为单个布尔值。 For example, to see if any value or all values in each of the columns is True. 例如,查看每个列中的任何值或所有值是否为True。
# Any value in either column is True?
(df.C > 0.25).any() or (df.C < -0.25).any()
True# All values in either column is True?
(df.C > 0.25).all() or (df.C < -0.25).all()
False
One convoluted way to achieve the same thing is to zip all of these columns together, and perform the appropriate logic. 一种实现相同目的的复杂方法是将所有这些列压缩在一起,并执行适当的逻辑。
>>> df[[any([a, b]) for a, b in zip(df.C > 0.25, df.C < -0.25)]]A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
For more details, refer to Boolean Indexing in the docs. 有关更多详细信息,请参阅文档中的布尔索引 。
#4楼
Or, alternatively, you could use Operator module. 或者,您也可以使用操作员模块。 More detailed information is here Python docs 更详细的信息在这里Python文档
import operator
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[operator.or_(df.C > 0.25, df.C < -0.25)]A B C
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.4438
#5楼
This excellent answer explains very well what is happening and provides a solution. 这个极好的答案很好地解释了正在发生的事情并提供了解决方案。 I would like to add another solution that might be suitable in similar cases: using the query
method: 我想添加另一种可能在类似情况下适用的解决方案:使用query
方法:
result = result.query("(var > 0.25) or (var < -0.25)")
See also .html#indexing-query . 另请参见.html#indexing-query 。
(Some tests with a dataframe I'm currently working with suggest that this method is a bit slower than using the bitwise operators on series of booleans: 2 ms vs. 870 µs) (对我目前正在使用的数据帧进行的一些测试表明,该方法比在一系列布尔值上使用按位运算符要慢一些:2 ms vs. 870 µs)
A piece of warning : At least one situation where this is not straightforward is when column names happen to be python expressions. 警告 :至少其中一种情况并非如此简单,那就是列名恰好是python表达式。 I had columns named WT_38hph_IP_2
, WT_38hph_input_2
and log2(WT_38hph_IP_2/WT_38hph_input_2)
and wanted to perform the following query: "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"
我有名为WT_38hph_IP_2
, WT_38hph_input_2
和log2(WT_38hph_IP_2/WT_38hph_input_2)
并想执行以下查询: "(log2(WT_38hph_IP_2/WT_38hph_input_2) > 1) and (WT_38hph_IP_2 > 20)"
I obtained the following exception cascade: 我获得了以下异常级联:
-
KeyError: 'log2'
-
UndefinedVariableError: name 'log2' is not defined
-
ValueError: "log2" is not a supported function
I guess this happened because the query parser was trying to make something from the first two columns instead of identifying the expression with the name of the third column. 我猜这是因为查询解析器试图从前两列中获取内容,而不是用第三列的名称来标识表达式。
A possible workaround is proposed here . 这里提出了一种可能的解决方法。
#6楼
Well pandas use bitwise '&' '|' 好吧熊猫使用按位'&''|' and each condition should be wrapped in a '()' 并且每个条件都应该用'()'包装
For example following works 例如以下作品
data_query = data[(data['year'] >= 2005) & (data['year'] <= 2010)]
But the same query without proper brackets does not 但是没有适当括号的相同查询不会
data_query = data[(data['year'] >= 2005 & data['year'] <= 2010)]