Cube Objects

Cube

class cr.cube.cube.Cube(response: Union[str, Dict[KT, VT]], cube_idx: Optional[int] = None, transforms: Optional[Dict[KT, VT]] = None, population: Optional[int] = None, mask_size: int = 0)[source]

Provides access to individual slices on a cube-result.

It also provides some attributes of the overall cube-result.

cube_idx must be None (or omitted) for a single-cube CubeSet. This indicates the CubeSet contains only a single cube and influences behaviors like CA-as-0th.

available_measures[source]

frozenset of available CUBE_MEASURE members in the cube response.

counts_with_missings[source]

ndarray of weighted, unweighted or valid counts including missing values.

The difference from .counts is that this property includes value for missing categories.

covariance[source]

Optional float64 ndarray of the cube_covariance if the measure exists.

cube_index[source]

Offset of this cube within its CubeSet.

description[source]

Return the description of the cube.

dimension_types[source]

Tuple of DIMENSION_TYPE member for each dimension of cube.

dimensions[source]

List of visible dimensions.

A cube involving a multiple-response (MR) variable has two dimensions for that variable (subvariables and categories dimensions), but is “collapsed” into a single effective dimension for cube-user purposes (its categories dimension is supressed). This collection will contain a single dimension for each MR variable and therefore may have fewer dimensions than appear in the cube response.

has_weighted_counts[source]

True if cube response has weighted count data.

inflate() → cr.cube.cube.Cube[source]

Return new Cube object with rows-dimension added.

A multi-cube (tabbook) response formed from a function (e.g. mean()) on a numeric variable arrives without a rows-dimension.

means[source]

Optional float64 ndarray of the cube_means if the measure exists.

missing[source]

Get missing count of a cube.

n_responses[source]

Total (int) number of responses considered.

name[source]

Return the name of the cube.

If the cube has 2 diensions, return the name of the second one. In case of a different number of dimensions, default to returning the name of the last one. In case of no dimensions, return the empty string.

ndim[source]

int count of dimensions for this cube.

overlaps[source]

Optional float64 ndarray of cube_overlaps if the measure exists.

The array has as many dimensions as there are defined in the cube query, plus the extra subvariables dimension as the last dimension.

partitions[source]

Sequence of _Slice, _Strand, or _Nub objects from this cube-result.

population_fraction[source]

The filtered/unfiltered ratio for cube response.

This value is required for properly calculating population on a cube where a filter has been applied. Returns 1.0 for an unfiltered cube. Returns np.nan if the unfiltered count is zero, which would otherwise result in a divide-by-zero error.

stddev[source]

Optional float64 ndarray of the cube_stddev if the measure exists.

sums[source]

Optional float64 ndarray of the cube_sum if the measure exists.

title[source]

str alternate-name given to cube-result.

This value is suitable for naming a Strand when displayed as a column. In this use-case it is a stand-in for the columns-dimension name since a strand has no columns dimension.

unweighted_counts[source]

ndarray of unweighted counts, valid elements only.

Unweighted counts are drawn from the result.counts field of the cube result. These counts are always present, even when the measure is numeric and there are no count measures. These counts are always unweighted, regardless of whether the cube is “weighted”.

In case of presence of valid counts in the cube response the counts are replaced with the valid counts measure.

unweighted_valid_counts[source]

Optional float64 ndarray of unweighted_valid_counts if the measure exists.

valid_counts_summary_range[source]

Optional (min, max) tuple of summary valid counts

valid_overlaps[source]

Optional float64 ndarray of cube_valid_overlaps if the measure exists.

The array has as many dimensions as there are defined in the cube query, plus the extra subvariables dimension as the last dimension.

weighted_counts[source]

ndarray of weighted counts, valid elements only.

In case of presence of valid counts in the cube response the weighted counts are replaced with the valid counts measure.

weighted_valid_counts[source]

Optional float64 ndarray of weighted_valid_counts if the measure exists.

CubeSet

class cr.cube.cube.CubeSet(cube_responses: List[Dict[KT, VT]], transforms: Dict[KT, VT], population: int, min_base: int)[source]

Represents a multi-cube cube-response.

Also works just fine for a single cube-response passed inside a sequence, allowing uniform handling of single and multi-cube responses.

cube_responses is a sequence of cube-response dicts received from Crunch. The sequence can contain a single item, such as a cube-response for a slide, but it must be contained in a sequence. A tabbook cube-response sequence can be passed as it was received.

transforms is a sequence of transforms dicts corresponding in order to the cube-responses. population is the estimated target population and is used when a population-projection measure is requested. min_base is an integer representing the minimum sample-size used for indicating values that are unreliable by reason of insufficient sample (base).

available_measures[source]

frozenset of available measures of the first cube in this set.

can_show_pairwise[source]

True if all 2D cubes in a multi-cube set can provide pairwise comparison.

description[source]

str description of first cube in this set.

has_weighted_counts[source]

True if cube-responses include a weighted-count measure.

is_ca_as_0th[source]

True for multi-cube when first cube represents a categorical-array.

A “CA-as-0th” tabbook tab is “3D” in the sense it is “sliced” into one table (partition-set) for each of the CA subvariables.

missing_count[source]

The number of missing values from first cube in this set.

n_responses[source]

Total number of responses considered from first cube in this set.

name[source]

str name of first cube in this set.

partition_sets[source]

Sequence of cube-partition collections across all cubes of this cube-set.

This value might look like the following for a ca-as-0th tabbook. For example:

(
    (_Strand, _Slice, _Slice),
    (_Strand, _Slice, _Slice),
    (_Strand, _Slice, _Slice),
)

and might often look like this for a typical slide:

((_Slice,))

Each partition set represents the partitions for a single “stacked” table. A 2D slide has a single partition-set of a single _Slice object, as in the second example above. A 3D slide would have multiple partition sets, each of a single _Slice. A tabook will have multiple partitions in each set, the first being a _Strand and the rest being _Slice objects. Multiple partition sets only arise for a tabbook in the CA-as-0th case.

population_fraction[source]

The filtered/unfiltered ratio for this cube-set.

This value is required for properly calculating population on a cube where a filter has been applied. Returns 1.0 for an unfiltered cube. Returns np.nan if the unfiltered count is zero, which would otherwise result in a divide-by-zero error.

valid_counts_summary_range[source]

The valid count summary values from first cube in this set.