NEWS

madshapR 2025-06-15

Attention: Some changes to functions in the current version of madshapR may require updates of existing code.

Superseded object.

| previous version (1.1.0 and older) | version 2.0.0 | |------------------------------------|-------------------| | madshapR_DEMO | madshapR_examples |

Superseded parameters.

| previous version (1.1.0 and older) | current version (2.0.0) | |----|----| | dataset_evaluate(as_data_dict_mlstr) | dataset_evaluate(is_data_dict_mlstr) | | data_dict_evaluate(as_data_dict_mlstr) | data_dict_evaluate(is_data_dict_mlstr) | | dossier_evaluate(as_data_dict_mlstr) | dossier_evaluate(is_data_dict_mlstr) |

Superseded function behaviors and/or output structures.

In dataset_evaluate(), data_dict_evaluate() and dossier_evaluate(), the columns generated in the outputs have been renamed as follows :

| previous version (1.1.0 and older) | current version (2.0.0) | |------------------------------------|-------------------------------| | index | Index | | name | Variable name | | label | Variable label | | valueType | Data dictionary valueType | | Categories::label | Categories in data dictionary | | Categories::missing | Non-valid categories |

In dataset_summarize() and dossier_summarize(), the columns generated in the outputs have been renamed as follows :

| previous version (1.1.0 and older) | current version (2.0.0) | |------------------------------------|----------------------------| | index in data dict.name | Index | | name | Variable name | | label | Variable label | | Estimated dataset valueType | Suggested valueType | | Actual dataset valueType | Dataset valueType | | Total number of observations | Number of rows | | Nb. distinct values | Number of distinct values | | Nb. valid values | Number of valid values | | Nb. non-valid values | Number of non-valid values | | Nb. NA | Number of empty values | | % total Valid values | % Valid values | | % Non-valid values | % Non-valid values | | % NA | % Empty values | | ———————————— | ——————————— |

Bug fixes and improvements

The package now handles the valueType datetime, which formerly was considered either as a text or date.

https://github.com/maelstrom-research/madshapR/issues/123

https://github.com/maelstrom-research/madshapR/issues/112

https://github.com/maelstrom-research/madshapR/issues/75

The valueType object (present in columns in a data dictionary or as an attribute of a variable) had some errors and bugs that have been corrected.

https://github.com/maelstrom-research/madshapR/issues/87

https://github.com/maelstrom-research/madshapR/issues/82

https://github.com/maelstrom-research/madshapR/issues/81

https://github.com/maelstrom-research/madshapR/issues/76

When a column in a dataset is all NA (empty), the previous version had some issues that have been has been corrected.

https://github.com/maelstrom-research/madshapR/issues/116

https://github.com/maelstrom-research/madshapR/issues/115

https://github.com/maelstrom-research/madshapR/issues/109

A bug in data_dict_pivot_longer() when ‘Source’ or ‘Target’ column was not present has been corrected.

https://github.com/maelstrom-research/madshapR/issues/86

The SPSS format, which haven package uses to produce labelled variables, define integers different form madshapR, which ultimately . That has been taken in account and corrected.
The SPSS format in the haven package used to produce labelled variables defines integers differently from madshapR, which was causing errors. The difference has been taken into account.

https://github.com/maelstrom-research/madshapR/issues/83

The group_by parameter has been redesigned.

dataset_preprocess() now handles grouped dataset, using parameter “group_by”.
Users can now define groups in summaries and visual reports using a variable that is not categorical or has empty values.

https://github.com/maelstrom-research/madshapR/issues/47

Previously, the “group_by” argument had some flaws, resulting in bugs that have been corrected.

https://github.com/maelstrom-research/madshapR/issues/114

https://github.com/maelstrom-research/madshapR/issues/113

https://github.com/maelstrom-research/madshapR/issues/110

https://github.com/maelstrom-research/madshapR/issues/105

Enhancements in the assessment and summary reports!

The assessment and summary reports had some updates, such as renamed columns and bug corrections.

https://github.com/maelstrom-research/madshapR/issues/126

https://github.com/maelstrom-research/madshapR/issues/104

https://github.com/maelstrom-research/madshapR/issues/98

https://github.com/maelstrom-research/madshapR/issues/97

https://github.com/maelstrom-research/madshapR/issues/96

https://github.com/maelstrom-research/madshapR/issues/95

https://github.com/maelstrom-research/madshapR/issues/94

https://github.com/maelstrom-research/madshapR/issues/93

https://github.com/maelstrom-research/madshapR/issues/92

https://github.com/maelstrom-research/madshapR/issues/91

https://github.com/maelstrom-research/madshapR/issues/90

https://github.com/maelstrom-research/madshapR/issues/89

https://github.com/maelstrom-research/madshapR/issues/88

https://github.com/maelstrom-research/madshapR/issues/85

https://github.com/maelstrom-research/madshapR/issues/80

https://github.com/maelstrom-research/madshapR/issues/79

Enhancements in the visual reports!

The visual reports have been improved, including better visual outputs and color palettes, and new features such as total number of rows next to the bar charts.

https://github.com/maelstrom-research/madshapR/issues/108

https://github.com/maelstrom-research/madshapR/issues/107

https://github.com/maelstrom-research/madshapR/issues/106

https://github.com/maelstrom-research/madshapR/issues/100

https://github.com/maelstrom-research/madshapR/issues/84

https://github.com/maelstrom-research/madshapR/issues/64

New functions

typeof_convert_to_valueType() converts typeof (and class if any) into its corresponding valueType.
valueType_convert_to_typeof() converts valueType into its corresponding typeof and class in R representation.
data_dict_update() updates a data dictionary from a dataset.
data_dict_trim_labels() adds shortened labels to data dictionary.
first_label_get() gets the first label from a data dictionary.
has_categories() tests if a dataset has categorical variables.

madshapR 1.1.0 (2024-04-23)

Bug fixes and improvements

for assessment, summary and visualization, the character columns in dataset are put to lower to avoid duplicated informations in outputs.

https://github.com/maelstrom-research/madshapR/issues/63

bug in the function variable_visualize() when the column was empty after removing internally stopwords.

https://github.com/maelstrom-research/Rmonize/issues/53

https://github.com/maelstrom-research/Rmonize/issues/49

Some elements were missing in the reports in dataset_evaluate()

https://github.com/maelstrom-research/madshapR/issues/66

Problem with names containing underscores in variables when visualized fixed.

https://github.com/maelstrom-research/madshapR/issues/62

Functions involving valueType (such as data_dict_apply(),valueType_guess() and valueType_adujst()) have been corrected to be more consistent in the usage of these functions.

https://github.com/maelstrom-research/madshapR/issues/61

The bug affecting tibbles which contain a variable named “test” has been corrected in the package fabR.

https://github.com/maelstrom-research/madshapR/issues/60

functions such as data_dict_summarize() and dataset_evaluate() have cells in tibble generated that can have more than accepted characters in a cell in Excel. the function truncates the cells in tibbles to a maximum of 10000 characters.

https://github.com/maelstrom-research/madshapR/issues/59

Problem with dataType in the function dataset_cat_as_labels() when the values found in the dataset are not in the data dictionary, and the valueType is text, and the dataType is “integer” has been fixed.

https://github.com/maelstrom-research/madshapR/issues/58

Functions involving date formatted variables have been corrected in the package fabR.

https://github.com/maelstrom-research/madshapR/issues/57

The inconsistent error in dataset_evaluate() has been corrected in the package fabR.

https://github.com/maelstrom-research/madshapR/issues/46

deprecated functions

To avoid confusion with help(function), the function madshapR_help() has been renamed madshapR_website().

Dependency changes

set a minimum dplyr dependence to avoid bugs

madshapR 1.0.3 (2023-12-19)

Bug fixes and improvements

Some of the tests were made with another package (Rmonize) which as “madshapR” as a dependence.

Enhance reports

in visual reports, void confusing changes in color scheme in visual reports.
Histograms for date variables display valid ranges.
in reports, change % NA as proportion in reports.
dossier_visualize() report shows variable labels in the same lang.
in visual reports, the bar plot only appears when there are multiple missing value types, otherwise only the pie chart is shown.
in reports, all of the percentages are now included under “Other values (non categorical)”, which gives a single value.
https://github.com/maelstrom-research/madshapR/issues/51

suppress overwrite parameter in dataset_visualize().

https://github.com/maelstrom-research/madshapR/issues/42

in dataset_summary() minor issue (consistency in column names and content).

correction of the function variable_visualize() when valueType_guess = TRUE

Correct Data dictionary functions

https://github.com/maelstrom-research/madshapR/issues/50

enhance the function check_data_dict_valueType(), which was too slow.

https://github.com/maelstrom-research/madshapR/issues/49

valueType_adjust() now works with empty column (all NAs)

allow the format date to be transformed into text in dataset_zap_data_dict() when the format is unclear.

New functions

col_id() function which is a short cut for calling the attribute madshapR::col_id of a dataset.
as_category(),is_category(),drop_category() function which coerces a vector as a categorical object. Typically a column in a dataset that needs to be coerced into a categorical variable (The data dictionary is updated accordingly).

Deprecated functions

Rename and update example rda Object (in data) of DEMO_files into madshapR_DEMO for consistency across our other packages.

madshapR 1.0.2 (2023-10-09)

Creation of NEWS feed !!

Addition of NEWS.md for the development version use “(development version)”.

Bug fixes and improvements

Some improvements in the documentation of the package has been made.
internal call of libraries (using ::) has been replaced by proper import in the declaration function.
get functions in fabR have been changed in its last release. the functions using them as dependencies ( check_xxx()) have been updated accordingly.
DEMO files no longer include harmonization files that are now in the package harmonizR

Dependency changes

New Imports: haven, lifecycle

No longer in Imports: xfun

New functions

These functions are imported from fabR

bookdown_template() replaces the deprecated function bookdown_template().
bookdown_render() which renders a Rmd collection of files into a docs/index.html website.
bookdown_open() Which allows to open a docs/index.html document when the bookdown is rendered

This separation into 3 functions will allow future developments, such as render as a ppt or pdf.

deprecated functions

Due to another package development (see fabR), The function open_visual_report() has been deprecated in favor of bookdown_open() imported from fabR package.

madshapR 1.0.0 (2023-06-20)

This package is a collection of wrapper functions used in data pipelines.

This is still a work in progress, so please let us know if you used a function before and is not working any longer.

Helper functions

madshapR_help() Call the help center for full documentation

functions to generate, shape and format meta data.

These functions allows to create, extract transform and apply meta data to a dataset.

Transform and shape:

data_dict_collapse(),data_dict_expand(),data_dict_filter(), data_dict_group_by(),data_dict_group_split(),data_dict_list_nest(), data_dict_pivot_longer(),data_dict_pivot_wider(),data_dict_ungroup()

extract/apply meta data:

data_dict_match_dataset(),data_dict_apply(), data_dict_extract()

evaluate and apply attributes:

as_data_dict(), as_data_dict_mlstr(),as_data_dict_shape(), is_data_dict(), is_data_dict_mlstr(), is_data_dict_shape() as_taxonomy(), is_taxonomy()

functions to generate, shape and format data.

These functions allows to create, extract transform data/meta data from a dataset. A dossier is a list of datasets.

evaluate and apply attributes:

as_dataset(), as_dossier() is_dataset(), is_dossier()

Extract/transform meta data: data_extract(), dossier_create(), dataset_zap_data_dict(), dataset_cat_as_labels()

Functions to work with data types

These functions allow user to work with, extract or assign data type (valueType) to values and/or dataset.

as_valueType(), is_valueType(), valueType_adjust(), valueType_guess(), valueType_self_adjust(), valueType_of()

Unit tests and QA for datasets and data dictionaries

These helper functions evaluate content of a dataset and/or data dictionary to extract from them irregularities or potential errors. These informations are stored in a tibble that can be use to assess inputs.

check_data_dict_categories(), check_data_dict_missing_categories(), check_data_dict_taxonomy(), check_data_dict_variables(), check_data_dict_valueType(), check_dataset_categories(), check_dataset_valueType(), check_dataset_variables(), check_name_standards()

Summarize information in dataset and data dictionaries

These helper functions evaluate content of a dataset and/or data dictionary to extract from them summary statistics and elements such as missing values, NA, category names, etc. These informations are stored in a tibble that can be use to summary inputs.

dataset_preprocess(), summary_variables(), summary_variables_categorical(),summary_variables_date(), summary_variables_numeric(),summary_variables_text()

Write and read excel and csv

read_csv_any_formats() The csv file is read twice to detect the number of lines to use in attributing the column type (guess_max parameter of read_csv). This avoids common errors when reading csv files.
read_excel_allsheets() The Excel file is read and the values are placed in a list of tibbles, with each sheet in a separate element in the list. If the Excel file has only one sheet, the output is a single tibble.
write_excel_allsheets() Write all Excel sheets using xlsx::write.xlsx() recursively.

Plot and summary functions used in a visual report

plot_bar(), plot_box(), plot_date(), plot_density(), plot_histogram(), plot_main_word(), plot_pie_valid_value(), summary_category(), summary_numerical(),summary_text()

aggregate information and generate reports

assess data

data_dict_evaluate() dataset_evaluate() dossier_evaluate()

summarize data

dataset_summarize() dossier_summarize()

visualize data

dataset_visualize() variable_visualize() open_visual_report()