Partition Objects¶

CubePartition¶

class cr.cube.cubepart.CubePartition(cube, transforms=None)[source]¶

A slice, a strand, or a nub drawn from a cube-response.

These represent 2, 1, or 0 dimensions of a cube, respectively.

cube_index[source]¶

Offset of this partition’s cube in its CubeSet.

Used to differentiate certain partitions like a filtered rows-summary strand.

dimension_types[source]¶

Sequence of member of cr.cube.enum.DIMENSION_TYPE for each dimension.

Items appear in rows-dimension, columns-dimension order.

classmethod factory(cube, slice_idx=0, transforms=None, population=None, ca_as_0th=None, mask_size=0)[source]¶: Return slice, strand, or nub object appropriate to passed parameters.

ndim[source]¶: int count of dimensions for this partition.

population_fraction[source]¶: population fraction of the cube

selected_category_labels[source]¶: Tuple of str: names of any and all underlying categories in ‘Selected’.

shape[source]¶

Tuple of int vector counts for this partition.

Not to be confused with numpy.ndarray.shape, this represent the count of rows and columns respectively, in this partition. It does not necessarily represent the shape of any underlying numpy.ndarray object that may arise in the implementation of the cube partition. In particular, the value of any count in the shape can be zero.

A _Slice has a shape like (2, 3) representing (row-count, col-count). A _Strand has a shape like (5,) which represents its row-count. The shape of a _Nub is unconditionally () (an empty tuple).

variable_name[source]¶: str representing the name of the superheading variable.

Slice¶

class cr.cube.cubepart._Slice(cube, slice_idx, transforms, population, mask_size)[source]¶

2D cube partition.

A slice represents the cross-tabulation of two dimensions, often, but not necessarily contributed by two different variables. A single CA variable has two dimensions which can be crosstabbed in a slice.

column_aliases[source]¶: 1D str ndarray of alias for each column, for use as column headings.

column_codes[source]¶: 1D int ndarray of code for each column, for use as column headings.

column_index[source]¶

2D np.float64 ndarray of column-index “percentage”.

The index values represent the difference of the percentages to the corresponding baseline values. The baseline values are the univariate percentages of the rows variable.

column_labels[source]¶: 1D str ndarray of name for each column, for use as column headings.

column_proportion_variances[source]¶: 2D ndarray of np.float64 column-proportion variance for each matrix cell.

column_proportions[source]¶

2D np.float64 ndarray of column-proportion for each matrix cell.

This is the proportion of the weighted-N (aka. weighted base) of its column that the weighted-count in each cell represents, generally a number between 0.0 and 1.0. Note that within an inserted subtotal vector involving differences, the values can range between -1.0 and 1.0.

column_proportions_moe[source]¶

1D/2D np.float64 ndarray of margin-of-error (MoE) for columns proportions.

The values are represented as fractions, analogue to the column_proportions property. This means that the value of 3.5% will have the value 0.035. The values can be np.nan when the corresponding percentage is also np.nan, which happens when the respective columns margin is 0.

column_share_sum[source]¶

2D optional np.float64 ndarray of column share sum value for each table cell.

Raises ValueError if the cube-result does not include a sum cube-measure.

Column share of sum is the sum of each subvar item divided by the TOTAL number of column items.

column_std_dev[source]¶

standard deviation for column percentages

std_deviation = sqrt(variance)

column_std_err[source]¶

standard error for column percentages

std_error = sqrt(variance/N)

column_unweighted_bases[source]¶: 2D np.float64 ndarray of unweighted col-proportion denominator per cell.

column_weighted_bases[source]¶: 2D np.float64 ndarray of column-proportion denominator for each cell.

columns_base[source]¶

1D/2D np.float64 ndarray of unweighted-N for each column/cell of slice.

This array is 2D (a distinct base for each cell) when the rows dimension is MR, because each MR-subvariable has its own unweighted N. This is because not every possible response is necessarily offered to every respondent.

In all other cases, the array is 1D, containing one value for each column.

columns_dimension_description[source]¶: str description assigned to columns-dimension.

columns_dimension_name[source]¶

str name assigned to columns-dimension.

Reflects the resolved dimension-name transform cascade.

columns_dimension_type[source]¶: Member of cr.cube.enum.DIMENSION_TYPE describing columns dimension.

columns_margin[source]¶

1D or 2D np.float64 ndarray of weighted-N for each column of slice.

This array is 2D (a distinct margin value for each cell) when the rows dimension is MR, because each MR-subvariable has its own weighted N. This is because not every possible response is necessarily offered to every respondent.

In all other cases, the array is 1D, containing one value for each column.

columns_margin_proportion[source]¶

1D or 2D np.float64 ndarray of weighted-proportion for each column of slice.

This array is 2D (a distinct margin value for each cell) when the rows dimension is MR, because each MR-subvariable has its own weighted N. This is because not every possible response is necessarily offered to every respondent.

In all other cases, the array is 1D, containing one value for each column.

columns_scale_mean[source]¶

Optional 1D np.float64 ndarray of scale mean for each column.

The returned vector is to be interpreted as a summary row. Also note that the underlying scale values are based on the numeric values of the opposing rows-dimension elements.

This value is None if no row element has an assigned numeric value.

columns_scale_mean_margin[source]¶

Optional float overall mean of column-scale values.

This value is the “margin” of the .columns_scale_mean vector and might typically appear in the cell immediately to the right of the .columns_scale_mean summary-row. It is similar to a “table-total” value, in that it is a scalar that might appear in the lower right-hand corner of a table, but note that it does not represent the overall table in that .rows_scale_mean_margin will not have the same value (except by chance). This value derives from the numeric values of the row elements whereas its counterpart .rows_scale_mean_margin derives from the numeric values of the column elements.

This value is None if no row has an assigned numeric-value.

columns_scale_mean_pairwise_indices[source]¶

Sequence of column-idx tuples indicating pairwise-t result of scale-means.

The sequence contains one tuple for each column. The indicies in a column’s tuple each identify another of the columns who’s scale-mean is pairwise-significant to that of the tuple’s column. Pairwise significance is computed based on the more restrictive (lesser-value) threshold specified in the analysis.

columns_scale_mean_pairwise_indices_alt[source]¶

Optional sequence of column-idx tuples indicating pairwise-t of scale-means.

This value is None if no secondary threshold value (alpha) was specified in the analysis. Otherwise, it is the same calculation as .columns_scale_mean_pairwise_indices computed using the less restrictive (greater-valued) threshold.

columns_scale_mean_stddev[source]¶

Optional 1D np.float64 ndarray of scale-mean std-deviation for each column.

The returned vector (1D array) is to be interpreted as a summary row. Also note that the underlying scale values are based on the numeric values of the opposing rows-dimension elements.

This value is None if no row element has been assigned a numeric value.

columns_scale_mean_stderr[source]¶

Optional 1D np.float64 ndarray of scale-mean standard-error for each row.

The returned vector is to be interpreted as a summary row. Also note that the underlying scale values are based on the numeric values of the opposing rows-dimension elements.

This value is None if no row element has a numeric value assigned or if the columns-weighted-base is None (eg an array variable in the row dim).

columns_scale_median[source]¶

Optional 1D np.float64 ndarray of scale median for each column.

The returned vector is to be interpreted as a summary row. Also note that the underlying scale values are based on the numeric values of the opposing rows-dimension elements.

This value is None if no row element has been assigned a numeric value.

columns_scale_median_margin[source]¶

Optional scalar numeric median of all column-scale values.

This value is the “margin” of the .columns_scale_median vector and might typically appear in the cell immediately to the right of the .columns_scale_median summary-row. It is similar to a “table-total” value, in that it is a scalar that might appear in the lower right-hand corner of a table, but note that it does not represent the overall table in that .rows_scale_median_margin will not have the same value (except by chance). This value derives from the numeric values of the row elements whereas its counterpart .rows_scale_median_margin derives from the numeric values of the column elements.

This value is None if no row has an assigned numeric-value.

counts[source]¶: 2D np.float64 ndarray of weighted cube counts.

derived_column_idxs[source]¶

tuple of int index of each derived column-element in slice.

An element is derived if it’s a subvariable of a multiple response dimension, which has been produced by the zz9, and inserted into the response data.

All other elements, including regular MR and CA subvariables, as well as categories of CAT dimensions, are not derived. Subtotals are also not derived in this sense, because they’re not even part of the data (elements).

derived_row_idxs[source]¶

tuple of int index of each derived row-element in slice.

An element is derived if it’s a subvariable of a multiple response dimension, which has been produced by the zz9, and inserted into the response data.

All other elements, including regular MR and CA subvariables, as well as categories of CAT dimensions, are not derived. Subtotals are also not derived in this sense, because they’re not even part of the data (elements).

description[source]¶: str description of this slice, which it takes from its rows-dimension.

diff_column_idxs[source]¶: tuple of int index of each difference column-element in slice.

diff_row_idxs[source]¶: tuple of int index of each difference row-element in slice.

has_scale_means[source]¶: True if the slice has valid columns scale mean.

inserted_column_idxs[source]¶: tuple of int index of each subtotal column in slice.

inserted_row_idxs[source]¶: tuple of int index of each subtotal row in slice.

means[source]¶

2D optional np.float64 ndarray of mean value for each table cell.

Cell value is np.nan for each cell corresponding to an inserted subtotal (mean of addend cells cannot simply be added to get the mean of the subtotal).

Raises ValueError if the cube-result does not include a means cube-measure.

name[source]¶

str name assigned to this slice.

A slice takes the name of its rows-dimension.

pairwise_indices[source]¶

2D ndarray of tuple of int column-idxs meeting pairwise-t threshold.

Like:

[
   [(1, 3, 4), (), (0,), (), ()],
   [(2,), (1, 2), (), (), (0, 3)],
   [(), (), (), (), ()],
]

Has the same shape as .counts. Each int represents the offset of another column in the same row with a confidence interval meeting the threshold defined for this analysis.

pairwise_indices_alt[source]¶

2D ndarray of tuple of int column-idxs meeting alternate threshold.

This value is None if no alternate threshold has been defined.

pairwise_means_indices[source]¶

Optional 2D ndarray of tuple column-idxs significance threshold for mean.

Like:

[
   [(1, 3, 4), (), (0,), (), ()],
   [(2,), (1, 2), (), (), (0, 3)],
   [(), (), (), (), ()],
]

Has the same shape as .means. Each int represents the offset of another column in the same row with a confidence interval meeting the threshold defined for this analysis.

pairwise_means_indices_alt[source]¶

2D ndarray of tuple of column-idxs meeting alternate threshold for mean.

This value is None if no alternate threshold has been defined.

pairwise_significance_means_p_vals(column_idx)[source]¶: Optional 2D ndarray of means significance p-vals matrices for column idx.

pairwise_significance_means_t_stats(column_idx)[source]¶: Optional 2D ndarray of means significance t-stats matrices for column idx.

pairwise_significance_p_vals(column_idx)[source]¶: 2D ndarray of pairwise-significance p-vals matrices for column idx.

pairwise_significance_t_stats(column_idx)[source]¶: return 2D ndarray of pairwise-significance t-stats for selected column.

pairwise_significance_tests[source]¶

tuple of _ColumnPairwiseSignificance tests.

Result has as many elements as there are columns in the slice. Each significance test contains p_vals and t_stats (ndarrays that represent probability values and statistical scores).

payload_order[source]¶

1D np.int64 ndarray of signed int idx respecting the payload order.

Positive integers indicate the 1-indexed position in payload of regular elements, while negative integers are the subtotal insertions.

Needed for reordering color palette in exporter.

population_counts[source]¶

2D np.float64 ndarray of population counts per cell.

The (estimated) population count is computed based on the population value provided when the Slice is created (._population). It is also adjusted to account for any filters that were applied as part of the query (._cube.population_fraction).

._population and _cube.population_fraction are both scalars and so do not affect sort order.

population_counts_moe[source]¶

2D np.float64 ndarray of population-count margin-of-error (MoE) per cell.

The values are represented as population estimates, analogue to the population_counts property. This means that the values will be presented by actual estimated counts of the population. The values can be np.nan when the corresponding percentage is also np.nan, which happens when the respective margin is 0.

When calculating the estimates of categorical dates, the total populatioin is not “divided” between its categories, but rather considered constant for all categorical dates (or waves). Hence, the different standard errors will be applied in these specific cases (like the row_std_err or column_std_err). If categorical dates are not involved, the standard table_std_err is used.

population_proportions[source]¶

2D np.float64 ndarray of proportions

The proportion used to calculate proportion counts depends on the dimension types.

population_std_err[source]¶

2D np.float64 ndarray of standard errors

The proportion used to calculate proportion counts depends on the dimension types.

pvals[source]¶

2D optional np.float64 ndarray of p-value for each cell.

A p-value is a measure of the probability that an observed difference could have occurred just by random chance. The lower the p-value, the greater the statistical significance of the observed difference.

A cell value of np.nan indicates a meaningful p-value could not be computed for that cell.

pvalues¶

2D optional np.float64 ndarray of p-value for each cell.

A p-value is a measure of the probability that an observed difference could have occurred just by random chance. The lower the p-value, the greater the statistical significance of the observed difference.

A cell value of np.nan indicates a meaningful p-value could not be computed for that cell.

residual_test_stats[source]¶

Exposes pvals and zscores (with HS) stacked together

Public method used as cube_method for the SOA API

row_aliases[source]¶

1D str ndarray of row alias for each matrix row.

These are suitable for use as row headings; alias for subtotal rows appear in the sequence and alias are ordered to correspond with their respective data row.

row_codes[source]¶

1D int ndarray of row codes for each matrix row.

These are suitable for use as row headings; codes for subtotal rows appear in the sequence and codes are ordered to correspond with their respective data row.

row_labels[source]¶

1D str ndarray of row name for each matrix row.

These are suitable for use as row headings; labels for subtotal rows appear in the sequence and labels are ordered to correspond with their respective data row.

row_order(format=<ORDER_FORMAT.SIGNED_INDEXES: 0>)[source]¶

1D np.int64 ndarray of idx for each assembled row of matrix.

If order format is SIGNED_INDEXES negative values represent inserted subtotal-row locations; for BOGUS_IDS insertios are represented by ins_{insertion_id} string.

Indices appear in the order rows are to appear in the final result.

Needed for reordering color palette in exporter.

row_proportion_variances[source]¶: 2D ndarray of np.float64 row-proportion variance for each matrix cell.

row_proportions[source]¶

2D np.float64 ndarray of row-proportion for each matrix cell.

This is the proportion of the weighted-N (aka. weighted base) of its row that the weighted-count in each cell represents, generally a number between 0.0 and 1.0. Note that within an inserted subtotal vector involving differences, the values can range between -1.0 and 1.0.

row_proportions_moe[source]¶

2D np.float64 ndarray of margin-of-error (MoE) for rows proportions.

The values are represented as percentage-fractions, analogue to the row_proportions property. This means that the value of 3.5% will have the value 0.035. The values can be np.nan when the corresponding percentage is also np.nan, which happens when the respective table margin is 0.

row_share_sum[source]¶

2D optional np.float64 ndarray of row share sum value for each table cell.

Raises ValueError if the cube-result does not include a sum cube-measure.

Row share of sum is the sum of each subvar item divided by the TOTAL number of row items.

row_std_dev[source]¶: 2D np.float64 ndarray of standard deviation for row percentages.

row_std_err[source]¶: 2D np.float64 ndarray of standard errors for row percentages.

row_unweighted_bases[source]¶: 2D np.float64 ndarray of unweighted row-proportion denominator per cell.

row_weighted_bases[source]¶: 2D np.float64 ndarray of row-proportion denominator for each table cell.

rows_base[source]¶

1D/2D np.float64 ndarray of unweighted-N for each row/cell of slice.

This array is 2D (a distinct base for each cell) when the columns dimension is MR, because each MR-subvariable has its own unweighted N. This is because not every possible response is necessarily offered to every respondent.

In all other cases, the array is 1D, containing one value for each column.

rows_dimension_alias[source]¶: str alias assigned to rows-dimension.

rows_dimension_description[source]¶

str description assigned to rows-dimension.

Reflects the resolved dimension-description transform cascade.

rows_dimension_fills[source]¶

tuple of optional RGB str like “#def032” fill color for each row in slice.

The values reflect the resolved element-fill transform cascade. The length and ordering of the sequence correspond to the rows in the slice, including accounting for insertions and hidden rows. A value of None indicates the default fill, possibly determined by a theme or template.

rows_dimension_name[source]¶

str name assigned to rows-dimension.

Reflects the resolved dimension-name transform cascade.

rows_dimension_type[source]¶: Member of cr.cube.enum.DIMENSION_TYPE specifying type of rows dimension.

rows_margin[source]¶

1D or 2D np.float64 ndarray of weighted-N for each column of slice.

This array is 2D (a distinct margin value for each cell) when the columns dimension is MR, because each MR-subvariable has its own weighted N. This is because not every possible response is necessarily offered to every respondent.

In all other cases, the array is 1D, containing one value for each column.

rows_margin_proportion[source]¶

1D or 2D np.float64 ndarray of weighted-proportion for each column of slice.

This array is 2D (a distinct margin value for each cell) when the columns dimension is MR, because each MR-subvariable has its own weighted N. This is because not every possible response is necessarily offered to every respondent.

In all other cases, the array is 1D, containing one value for each column.

rows_scale_mean[source]¶

Optional 1D np.float64 ndarray of scale mean for each row.

The returned vector is to be interpreted as a summary column. Also note that the underlying scale values are based on the numeric values of the opposing columns-dimension elements.

This value is None if no column element has an assigned numeric value.

rows_scale_mean_margin[source]¶

Optional float overall mean of row-scale values.

This value is the “margin” of the .rows_scale_mean vector and might typically appear in the cell immediately below the .rows_scale_mean summary-column. It is similar to a “table-total” value, in that it is a scalar that might appear in the lower right-hand corner of a table, but note that it does not represent the overall table in that .columns_scale_mean_margin will not have the same value (except by chance). This value derives from the numeric values of the column elements whereas its counterpart .columns_scale_mean_margin derives from the numeric values of the row elements.

This value is None if no column has an assigned numeric-value.

rows_scale_mean_stddev[source]¶

Optional 1D np.float64 ndarray of std-deviation of scale-mean for each row.

The returned vector (1D array) is to be interpreted as a summary column. Also note that the underlying scale values are based on the numeric values of the opposing columns-dimension elements.

This value is None if no column elements have an assigned numeric value.

rows_scale_mean_stderr[source]¶

Optional 1D np.float64 ndarray of standard-error of scale-mean for each row.

The returned vector is to be interpreted as a summary column. Also note that the underlying scale values are based on the numeric values of the opposing columns-dimension elements.

This value is None if no column element has a numeric value assigned or if the rows-weighted-base is None (eg an array variable in the column dim).

rows_scale_median[source]¶

Optional 1D np.float64 ndarray of scale median for each row.

The returned vector is to be interpreted as a summary column. Also note that the underlying scale values are based on the numeric values of the opposing columns-dimension elements.

This value is None if no column element has an assigned numeric value.

rows_scale_median_margin[source]¶

Optional scalar numeric median of all row-scale values.

This value is the “margin” of the .rows_scale_median vector and might typically appear in the cell immediately below the .rows_scale_median summary-column. It is similar to a “table-total” value, in that it is a scalar that might appear in the lower right-hand corner of a table, but note that it does not represent the overall table in that .columns_scale_mean_margin will not have the same value (except by chance). This value derives from the numeric values of the column elements whereas its counterpart .columns_scale_median_margin derives from the numeric values of the row elements.

This value is None if no column has an assigned numeric-value.

smoothed_column_index[source]¶

2D np.float64 ndarray of smoothed column-index “percentage”.

If cube has smoothing specification in the transforms it will return the column index smoothed according to the algorithm and the parameters specified, otherwise it fallbacks to unsmoothed values.

smoothed_column_percentages[source]¶

2D np.float64 ndarray of smoothed column-percentages for each matrix cell.

If cube has smoothing specification in the transforms it will return the column percentages smoothed according to the algorithm and the parameters specified, otherwise it fallbacks to unsmoothed values.

smoothed_column_proportions[source]¶

2D np.float64 ndarray of smoothed column-proportion for each matrix cell.

This is the proportion of the weighted-count for cell to the weighted-N of the column the cell appears in (aka. column-margin). Generally a number between 0.0 and 1.0 inclusive, but subtotal differences can be between -1.0 and 1.0 inclusive.

If cube has smoothing specification in the transforms it will return the column proportions smoothed according to the algorithm and the parameters specified, otherwise it fallbacks to unsmoothed values.

smoothed_columns_scale_mean[source]¶

Optional 1D np.float64 ndarray of smoothed scale mean for each column.

If cube has smoothing specification in the transforms it will return the column scale mean smoothed according to the algorithm and the parameters specified, otherwise it fallbacks to unsmoothed values.

smoothed_means[source]¶

2D optional np.float64 ndarray of smoothed mean value for each table cell.

If cube has smoothing specification in the transforms it will return the smoothed means according to the algorithm and the parameters specified, otherwise it fallbacks to unsmoothed values.

stddev[source]¶

2D optional np.float64 ndarray of stddev value for each table cell.

Raises ValueError if the cube-result does not include a stddev cube-measure.

sums[source]¶

2D optional np.float64 ndarray of sum value for each table cell.

Raises ValueError if the cube-result does not include a sum cube-measure.

tab_alias[source]¶: Subvar alias of slice id if first dimension is a CA_SUBVAR, ‘”’ otherwise.

tab_label[source]¶: Subvar label of slice id if first dimension is a CA_SUBVAR, ‘”’ otherwise.

table_base[source]¶

Scalar or 1D/2D np.float64 ndarray of unweighted-N for table.

This value is scalar when the slice has no MR dimensions, 1D when the slice has one MR dimension (either MR_X or X_MR), and 2D for an MR_X_MR slice.

The caller must know the dimensionality of the slice in order to correctly interpret a 1D value for this property.

This value has four distinct forms, depending on the slice dimensions:

ARR_X_ARR - 2D ndarray with a distinct table-base value per cell.

ARR_X - 1D ndarray of value per row when only rows dimension is ARR.

X_ARR - 1D ndarray of value per column when only col dimension is ARR

CAT_X_CAT - scalar float value when slice has no MR dimension.

table_base_range[source]¶

[min, max] np.float64 ndarray range of the table_base (table-unweighted-base)

A CAT_X_CAT has a scalar for all table-unweighted-bases, but arrays have more than one table-weighted-base. This collapses all the values them to the range, and it is “unpruned”, meaning that it is calculated before any hiding or removing of empty rows/columns.

table_margin[source]¶

Scalar or 1D/2D np.float64 ndarray of weighted-N table.

This value is scalar when the slice has no MR dimensions, 1D when the slice has one MR dimension (either MR_X or X_MR), and 2D for an MR_X_MR slice.

The caller must know the dimensionality of the slice in order to correctly interpret a 1D value for this property.

This value has four distinct forms, depending on the slice dimensions:

CAT_X_CAT - scalar float value when slice has no ARRAY dimension.

ARRAY_X - 1D ndarray of value per row when only rows dimension is ARRAY.

X_ARRAY - 1D ndarray of value per column when only column is ARRAY.

ARRAY_X_ARRAY - 2D ndarray with a distinct table-margin value per cell.

table_margin_range[source]¶

[min, max] np.float64 ndarray range of the table_margin (table-weighted-base)

A CAT_X_CAT has a scalar for all table-weighted-bases, but arrays have more than one table-weighted-base. This collapses all of the values to a range, and it is “unpruned”, meaning that it is calculated before any hiding or removing of empty rows/columns.

table_name[source]¶

Optional table name for this Slice

Provides differentiated name for each stacked table of a 3D cube.

table_proportion_variances[source]¶: 2D ndarray of np.float64 table-proportion variance for each matrix cell.

table_proportions[source]¶

2D ndarray of np.float64 fraction of table count each cell contributes.

This is the proportion of the weighted-count for cell to the weighted-N of the row the cell appears in (aka. table-margin). Generally a number between 0.0 and 1.0 inclusive, but subtotal differences can be between -1.0 and 1.0 inclusive.

table_proportions_moe[source]¶

1D/2D np.float64 ndarray of margin-of-error (MoE) for table proportions.

The values are represented as fractions, analogue to the table_proportions property. This means that the value of 3.5% will have the value 0.035. The values can be np.nan when the corresponding percentage is also np.nan, which happens when the respective table margin is 0.

table_std_dev[source]¶: 2D np.float64 ndarray of std-dev of table-percent for each table cell.

table_std_err[source]¶

2D optional np.float64 ndarray of std-error of table-percent for each cell.

A cell value can be np.nan under certain conditions.

table_unweighted_bases[source]¶: 2D np.float64 ndarray of unweighted table-proportion denominator per cell.

table_weighted_bases[source]¶: 2D np.float64 ndarray of table-proportion denominator for each cell.

total_share_sum[source]¶

2D optional np.float64 ndarray of total share sum value for each table cell.

Raises ValueError if the cube-result does not include a sum cube-measure.

Total share of sum is the sum of each subvar item divided by the TOTAL of items.

unweighted_counts[source]¶: 2D np.float64 ndarray of unweighted count for each slice matrix cell.

weighted_counts¶: 2D np.float64 ndarray of weighted cube counts.

zscores[source]¶

2D np.float64 ndarray of std-res value for each cell of matrix.

A z-score is also known as a standard score and is the number of standard deviations above (positive) or below (negative) the population mean a cell’s value is.

Strand¶

class cr.cube.cubepart._Strand(cube, transforms, population, ca_as_0th, slice_idx, mask_size)[source]¶

1D cube-partition.

A strand can arise from a 1D cube (non-CA univariate), or as a partition of a CA-cube (CAs are 2D) into a sequence of 1D partitions, one for each subvariable.

counts¶

1D np.float64 ndarray of weighted count for each row of strand.

The values are int when the underlying cube-result has no weighting.

derived_row_idxs[source]¶

tuple of int index of each derived row-element in this strand.

Subtotals cannot be derived

An element is derived if it’s a subvariable of a multiple response dimension, which has been produced by the zz9, and inserted into the response data.

All other elements, including regular MR and CA subvariables, as well as categories of CAT dimensions, are not derived. Subtotals are also not derived in this sense, because they’re not even part of the data (elements).

diff_row_idxs[source]¶

tuple of int index of each difference row-element in this strand.

Valid elements are cannot be differences, only some subtotals can.

has_scale_means[source]¶: True if the strand has valid scale means.

inserted_row_idxs[source]¶

tuple of int index of each inserted row in this strand.

Suitable for use in applying different formatting (e.g. Bold) to inserted rows. Provided index values correspond to measure values as-delivered by this strand, after any insertion of subtotals, re-ordering, and hiding/pruning of rows specified in a transform has been applied.

Provided index values correspond rows after any insertion of subtotals, re-ordering, and hiding/pruning.

means[source]¶

1D np.float64 ndarray of mean for each row of strand.

Raises ValueError when accessed on a cube-result that does not contain a means cube-measure.

min_base_size_mask[source]¶

1D bool ndarray of True for each row that fails to meet min-base spec.

The “base” is the physical (unweighted) count of respondents to the question. When this is lower than a specified threshold, the reliability of the value is too low to be meaningful. The threshold is defined by the caller (user).

payload_order[source]¶

1D np.int64 ndarray of signed int idx respecting the payload order.

Positive integers indicate the 1-indexed position in payload of regular elements, while negative integers are the subtotal insertions.

Needed for reordering color palette in exporter.

population_counts[source]¶

1D np.float64 ndarray of population count for each row of strand.

The (estimated) population count is computed based on the population value provided when the Strand is created. It is also adjusted to account for any filters that were applied as part of the query.

population_counts_moe[source]¶

1D np.float64 ndarray of population margin-of-error (MoE) for table percents.

The values are represented as population estimates, analogue to the population_counts property. This means that the values will be presented by actual estimated counts of the population The values can be np.nan when the corresponding percentage is also np.nan, which happens when the respective table margin is 0.

population_proportion_stderrs[source]¶

1D np.float64 population-proportion-standard-error for each row

Generally equal to the table_proprotion_standard_error, but because we don’t divide the population when the row is a CAT_DATE, can also be all 0s. Used to calculate the population_counts_moe.

population_proportions[source]¶

1D np.float64 population-proportion for each row

Generally equal to the table_proprotions, but because we don’t divide the population when the row is a CAT_DATE, can also be all 1s. Used to calculate the population_counts.

row_aliases[source]¶: 1D str ndarray of alias for each row, for use as row headings.

row_codes[source]¶: 1D int ndarray of code for each row, for use as row headings.

row_count[source]¶

int count of rows in a returned measure or marginal.

This count includes inserted rows but not rows that have been hidden/pruned.

row_labels[source]¶: 1D str ndarray of name for each row, suitable for use as row headings.

row_order(format=<ORDER_FORMAT.SIGNED_INDEXES: 0>)[source]¶

1D np.int64 ndarray of idx for each assembled row of stripe.

If order format is SIGNED_INDEXES negative values represent inserted subtotal-row locations; for BOGUS_IDS insertios are represented by ins_{insertion_id} string. Indices appear in the order rows are to appear in the final result.

Needed for reordering color palette in exporter.

rows_base[source]¶: 1D np.float64 ndarray of unweighted-N for each row of slice.

rows_dimension_alias[source]¶: str alias assigned to rows-dimension.

rows_dimension_description[source]¶

str description assigned to rows-dimension.

Reflects the resolved dimension-description transform cascade.

rows_dimension_fills[source]¶

tuple of optional RGB str like “#def032” fill color for each strand row.

Each value reflects the resolved element-fill transform cascade. The length and ordering of the sequence correspond to the rows in the slice, including accounting for insertions, ordering, and hidden rows. A fill value is None when no explicit fill color is defined for that row, indicating the default fill color for that row should be used, probably coming from a caller-defined theme.

rows_dimension_name[source]¶

str name assigned to rows-dimension.

Reflects the resolved dimension-name transform cascade.

rows_dimension_type[source]¶: Member of DIMENSION_TYPE enum describing type of rows dimension.

rows_margin[source]¶: 1D np.float64 ndarray of weighted-N for each row of slice.

scale_mean[source]¶

Optional float mean of row numeric-values (scale).

This value is None when no row-elements have a numeric-value assigned. The numeric value (aka. “scale”) for a row is its count multiplied by the numeric-value of its element. For example, if 100 women responded “Very Likely” and the numeric-value of the “Very Likely” response (element) was 4, then the scale for that row would be 400. The scale mean is the average of those scale values over the total count of responses.

scale_median[source]¶

Optional int/float median of scaled weighted-counts.

This value is None when no rows have a numeric-value assigned.

scale_std_dev[source]¶

Optional np.float64 standard-deviation of scaled weighted counts.

This value is None when no rows have a numeric-value assigned.

scale_std_err[source]¶

Optional np.float64 standard-error of scaled weighted counts.

This value is None when no rows have a numeric-value assigned. The value has the same units as the assigned numeric values and indicates the dispersion of the scaled-count distribution from its mean (scale-mean).

scale_stddev¶

Optional np.float64 standard-deviation of scaled weighted counts.

This value is None when no rows have a numeric-value assigned.

scale_stderr¶

Optional np.float64 standard-error of scaled weighted counts.

This value is None when no rows have a numeric-value assigned. The value has the same units as the assigned numeric values and indicates the dispersion of the scaled-count distribution from its mean (scale-mean).

shape[source]¶

Tuple of int vector counts for this partition.

A _Strand has a shape like (5,) which represents its row-count.

Not to be confused with numpy.ndarray.shape, this represent the count of rows in this strand. It does not necessarily represent the shape of any underlying numpy.ndarray object In particular, the value of its row-count can be zero.

share_sum[source]¶

1D np.float64 ndarray of share of sum for each row of strand.

Raises ValueError if the cube-result does not include a sum cube-measure.

Share of sum is the sum of each subvar item divided by the TOTAL number of items.

smoothed_means[source]¶

1D np.float64 ndarray of smoothed mean for each row of strand.

If cube has smoothing specification in the transforms it will return the smoothed means according to the algorithm and the parameters specified, otherwise it fallbacks to unsmoothed values.

stddev[source]¶

1D np.float64 ndarray of stddev for each row of strand.

Raises ValueError when accessed on a cube-result that does not contain a stddev cube-measure.

sums[source]¶

1D np.float64 ndarray of sum for each row of strand.

Raises ValueError when accessed on a cube-result that does not contain a sum cube-measure.

tab_alias[source]¶: Subvar alias of strand if first dimension is a CA_SUBVAR, ‘””’ otherwise.

tab_label[source]¶: Subvar label of strand if first dimension is a CA_SUBVAR, ‘””’ otherwise.

table_base_range[source]¶

[min, max] np.float64 ndarray range of unweighted-N for this stripe.

A non-MR stripe will have a single base, represented by min and max being the same value. Each row of an MR stripe has a distinct base, which is reduced to a range in that case.

table_margin_range[source]¶

[min, max] np.float64 ndarray range of (total) weighted-N for this stripe.

A non-MR stripe will have a single margin, represented by min and max being the same value. Each row of an MR stripe has a distinct base, which is reduced to a range in that case.

table_name[source]¶

Optional table name for this strand

Only for CA-as-0th case, provides differentiated names for stacked tables.

table_percentages[source]¶

1D np.float64 ndarray of table-percentage for each row.

Table-percentage is the fraction of the table weighted-N contributed by each row, expressed as a percentage (float between 0.0 and 100.0 inclusive).

table_proportion_moes[source]¶

1D np.float64 ndarray of table-proportion margin-of-error (MoE) for each row.

The values are represented as fractions, analogue to the table_proportions property. This means that the value of 3.5% will have the value 0.035. The values can be np.nan when the corresponding proportion is also np.nan, which happens when the respective columns margin is 0.

table_proportion_stddevs[source]¶: 1D np.float64 ndarray of table-proportion std-deviation for each row.

table_proportion_stderrs[source]¶: 1D np.float64 ndarray of table-proportion std-error for each row.

table_proportions[source]¶

1D np.float64 ndarray of fraction of weighted-N contributed by each row.

The proportion is expressed as a float between 0.0 and 1.0 inclusive.

title[source]¶

The str display name of this strand, suitable for use as a column heading.

Strand.name is the rows-dimension name, which is suitable for use as a title of the row-headings. However, a strand can also appear as a column and this value is a suitable name for such a column.

unweighted_bases[source]¶

1D np.float64 ndarray of base count for each row, before weighting.

When the rows dimension is multiple-response (MR), each value is different, reflecting the base for that individual subvariable. In all other cases, the table base is repeated for each row.

unweighted_counts[source]¶: 1D np.float64 ndarray of unweighted count for each row of stripe.

weighted_bases[source]¶

1D np.float64 ndarray of table-proportion denominator for each row.

For a non-MR strand, all values in the array are the same. For an MR strand, each value may be different, reflecting the fact that not all response options were necessarily presented to all respondents.

weighted_counts[source]¶

1D np.float64 ndarray of weighted count for each row of strand.

The values are int when the underlying cube-result has no weighting.

Nub¶

class cr.cube.cubepart._Nub(cube, transforms=None)[source]¶

0D slice.

is_empty[source]¶: True if the partition has no counts, False otherwise

means[source]¶: Float scalar representing the mean.

table_base[source]¶: Int scalar of the unweighted N of the table.

unweighted_count[source]¶: Integer scalar of total unweighted count of the table