Changes in version 2025-06-15                      

Attention: Some changes to functions in the current version of madshapR
may require updates of existing code.

Superseded object.

| previous version (1.1.0 and older) | version 2.0.0 |
|------------------------------------|-------------------| |
madshapR_DEMO | madshapR_examples |

Superseded parameters.

| previous version (1.1.0 and older) | current version (2.0.0) |
|----|----| | dataset_evaluate(as_data_dict_mlstr) |
dataset_evaluate(is_data_dict_mlstr) | |
data_dict_evaluate(as_data_dict_mlstr) |
data_dict_evaluate(is_data_dict_mlstr) | |
dossier_evaluate(as_data_dict_mlstr) |
dossier_evaluate(is_data_dict_mlstr) |

Superseded function behaviors and/or output structures.

In dataset_evaluate(), data_dict_evaluate() and dossier_evaluate(), the
columns generated in the outputs have been renamed as follows :

| previous version (1.1.0 and older) | current version (2.0.0) |
|------------------------------------|-------------------------------| |
index | Index | | name | Variable name | | label | Variable label | |
valueType | Data dictionary valueType | | Categories::label | Categories
in data dictionary | | Categories::missing | Non-valid categories |

In dataset_summarize() and dossier_summarize(), the columns generated in
the outputs have been renamed as follows :

| previous version (1.1.0 and older) | current version (2.0.0) |
|------------------------------------|----------------------------| |
index in data dict.name | Index | | name | Variable name | | label |
Variable label | | Estimated dataset valueType | Suggested valueType | |
Actual dataset valueType | Dataset valueType | | Total number of
observations | Number of rows | | Nb. distinct values | Number of
distinct values | | Nb. valid values | Number of valid values | | Nb.
non-valid values | Number of non-valid values | | Nb. NA | Number of
empty values | | % total Valid values | % Valid values | | % Non-valid
values | % Non-valid values | | % NA | % Empty values | | ———————————— |
——————————— |

Bug fixes and improvements

  - The package now handles the valueType datetime, which formerly was
    considered either as a text or date.

https://github.com/maelstrom-research/madshapR/issues/123

https://github.com/maelstrom-research/madshapR/issues/112

https://github.com/maelstrom-research/madshapR/issues/75

  - The valueType object (present in columns in a data dictionary or as
    an attribute of a variable) had some errors and bugs that have been
    corrected.

https://github.com/maelstrom-research/madshapR/issues/87

https://github.com/maelstrom-research/madshapR/issues/82

https://github.com/maelstrom-research/madshapR/issues/81

https://github.com/maelstrom-research/madshapR/issues/76

  - When a column in a dataset is all NA (empty), the previous version
    had some issues that have been has been corrected.

https://github.com/maelstrom-research/madshapR/issues/116

https://github.com/maelstrom-research/madshapR/issues/115

https://github.com/maelstrom-research/madshapR/issues/109

  - A bug in data_dict_pivot_longer() when ‘Source’ or ‘Target’ column
    was not present has been corrected.

https://github.com/maelstrom-research/madshapR/issues/86

  - The SPSS format, which haven package uses to produce labelled
    variables, define integers different form madshapR, which ultimately
    . That has been taken in account and corrected.

  - The SPSS format in the haven package used to produce labelled
    variables defines integers differently from madshapR, which was
    causing errors. The difference has been taken into account.

https://github.com/maelstrom-research/madshapR/issues/83

The group_by parameter has been redesigned.

  - dataset_preprocess() now handles grouped dataset, using parameter
    “group_by”.

  - Users can now define groups in summaries and visual reports using a
    variable that is not categorical or has empty values.

https://github.com/maelstrom-research/madshapR/issues/47

  - Previously, the “group_by” argument had some flaws, resulting in
    bugs that have been corrected.

https://github.com/maelstrom-research/madshapR/issues/114

https://github.com/maelstrom-research/madshapR/issues/113

https://github.com/maelstrom-research/madshapR/issues/110

https://github.com/maelstrom-research/madshapR/issues/105

Enhancements in the assessment and summary reports!

  - The assessment and summary reports had some updates, such as renamed
    columns and bug corrections.

https://github.com/maelstrom-research/madshapR/issues/126

https://github.com/maelstrom-research/madshapR/issues/104

https://github.com/maelstrom-research/madshapR/issues/98

https://github.com/maelstrom-research/madshapR/issues/97

https://github.com/maelstrom-research/madshapR/issues/96

https://github.com/maelstrom-research/madshapR/issues/95

https://github.com/maelstrom-research/madshapR/issues/94

https://github.com/maelstrom-research/madshapR/issues/93

https://github.com/maelstrom-research/madshapR/issues/92

https://github.com/maelstrom-research/madshapR/issues/91

https://github.com/maelstrom-research/madshapR/issues/90

https://github.com/maelstrom-research/madshapR/issues/89

https://github.com/maelstrom-research/madshapR/issues/88

https://github.com/maelstrom-research/madshapR/issues/85

https://github.com/maelstrom-research/madshapR/issues/80

https://github.com/maelstrom-research/madshapR/issues/79

Enhancements in the visual reports!

  - The visual reports have been improved, including better visual
    outputs and color palettes, and new features such as total number of
    rows next to the bar charts.

https://github.com/maelstrom-research/madshapR/issues/108

https://github.com/maelstrom-research/madshapR/issues/107

https://github.com/maelstrom-research/madshapR/issues/106

https://github.com/maelstrom-research/madshapR/issues/100

https://github.com/maelstrom-research/madshapR/issues/84

https://github.com/maelstrom-research/madshapR/issues/64

New functions

  - typeof_convert_to_valueType() converts typeof (and class if any)
    into its corresponding valueType.

  - valueType_convert_to_typeof() converts valueType into its
    corresponding typeof and class in R representation.

  - data_dict_update() updates a data dictionary from a dataset.

  - data_dict_trim_labels() adds shortened labels to data dictionary.

  - first_label_get() gets the first label from a data dictionary.

  - has_categories() tests if a dataset has categorical variables.

                 Changes in version 1.1.0 (2024-04-23)                  

Bug fixes and improvements

  - for assessment, summary and visualization, the character columns in
    dataset are put to lower to avoid duplicated informations in
    outputs.

https://github.com/maelstrom-research/madshapR/issues/63

  - bug in the function variable_visualize() when the column was empty
    after removing internally stopwords.

https://github.com/maelstrom-research/Rmonize/issues/53

https://github.com/maelstrom-research/Rmonize/issues/49

  - Some elements were missing in the reports in dataset_evaluate()

https://github.com/maelstrom-research/madshapR/issues/66

  - Problem with names containing underscores in variables when
    visualized fixed.

https://github.com/maelstrom-research/madshapR/issues/62

  - Functions involving valueType (such as
    data_dict_apply(),valueType_guess() and valueType_adujst()) have
    been corrected to be more consistent in the usage of these
    functions.

https://github.com/maelstrom-research/madshapR/issues/61

  - The bug affecting tibbles which contain a variable named “test” has
    been corrected in the package fabR.

https://github.com/maelstrom-research/madshapR/issues/60

  - functions such as data_dict_summarize() and dataset_evaluate() have
    cells in tibble generated that can have more than accepted
    characters in a cell in Excel. the function truncates the cells in
    tibbles to a maximum of 10000 characters.

https://github.com/maelstrom-research/madshapR/issues/59

  - Problem with dataType in the function dataset_cat_as_labels() when
    the values found in the dataset are not in the data dictionary, and
    the valueType is text, and the dataType is “integer” has been fixed.

https://github.com/maelstrom-research/madshapR/issues/58

  - Functions involving date formatted variables have been corrected in
    the package fabR.

https://github.com/maelstrom-research/madshapR/issues/57

  - The inconsistent error in dataset_evaluate() has been corrected in
    the package fabR.

https://github.com/maelstrom-research/madshapR/issues/46

deprecated functions

To avoid confusion with help(function), the function madshapR_help() has
been renamed madshapR_website().

Dependency changes

  - set a minimum dplyr dependence to avoid bugs

                 Changes in version 1.0.3 (2023-12-19)                  

Bug fixes and improvements

Some of the tests were made with another package (Rmonize) which as
“madshapR” as a dependence.

Enhance reports

  - in visual reports, void confusing changes in color scheme in visual
    reports.

  - Histograms for date variables display valid ranges.

  - in reports, change % NA as proportion in reports.

  - dossier_visualize() report shows variable labels in the same lang.

  - in visual reports, the bar plot only appears when there are multiple
    missing value types, otherwise only the pie chart is shown.

  - in reports, all of the percentages are now included under “Other
    values (non categorical)”, which gives a single value.

  - https://github.com/maelstrom-research/madshapR/issues/51

suppress overwrite parameter in dataset_visualize().

  - https://github.com/maelstrom-research/madshapR/issues/42

in dataset_summary() minor issue (consistency in column names and
content).

  - correction of the function variable_visualize() when valueType_guess
    = TRUE

Correct Data dictionary functions

  - https://github.com/maelstrom-research/madshapR/issues/50

enhance the function check_data_dict_valueType(), which was too slow.

  - https://github.com/maelstrom-research/madshapR/issues/49

valueType_adjust() now works with empty column (all NAs)

  - allow the format date to be transformed into text in
    dataset_zap_data_dict() when the format is unclear.

New functions

  - col_id() function which is a short cut for calling the attribute
    madshapR::col_id of a dataset.

  - as_category(),is_category(),drop_category() function which coerces a
    vector as a categorical object. Typically a column in a dataset that
    needs to be coerced into a categorical variable (The data dictionary
    is updated accordingly).

Deprecated functions

  - Rename and update example rda Object (in data) of DEMO_files into
    madshapR_DEMO for consistency across our other packages.

                 Changes in version 1.0.2 (2023-10-09)                  

Creation of NEWS feed !!

Addition of NEWS.md for the development version use “(development
version)”.

Bug fixes and improvements

  - Some improvements in the documentation of the package has been made.

  - internal call of libraries (using ::) has been replaced by proper
    import in the declaration function.

  - get functions in fabR have been changed in its last release. the
    functions using them as dependencies ( check_xxx()) have been
    updated accordingly.

  - DEMO files no longer include harmonization files that are now in the
    package harmonizR

Dependency changes

New Imports: haven, lifecycle

No longer in Imports: xfun

New functions

These functions are imported from fabR

  - bookdown_template() replaces the deprecated function
    bookdown_template().

  - bookdown_render() which renders a Rmd collection of files into a
    docs/index.html website.

  - bookdown_open() Which allows to open a docs/index.html document when
    the bookdown is rendered

This separation into 3 functions will allow future developments, such as
render as a ppt or pdf.

deprecated functions

Due to another package development (see fabR), The function
open_visual_report() has been deprecated in favor of bookdown_open()
imported from fabR package.

                 Changes in version 1.0.0 (2023-06-20)                  

This package is a collection of wrapper functions used in data
pipelines.

This is still a work in progress, so please let us know if you used a
function before and is not working any longer.

Helper functions

  - madshapR_help() Call the help center for full documentation

functions to generate, shape and format meta data.

These functions allows to create, extract transform and apply meta data
to a dataset.

  - Transform and shape:

data_dict_collapse(),data_dict_expand(),data_dict_filter(),
data_dict_group_by(),data_dict_group_split(),data_dict_list_nest(),
data_dict_pivot_longer(),data_dict_pivot_wider(),data_dict_ungroup()

  - extract/apply meta data:

data_dict_match_dataset(),data_dict_apply(), data_dict_extract()

  - evaluate and apply attributes:

as_data_dict(), as_data_dict_mlstr(),as_data_dict_shape(),
is_data_dict(), is_data_dict_mlstr(), is_data_dict_shape()
as_taxonomy(), is_taxonomy()

functions to generate, shape and format data.

These functions allows to create, extract transform data/meta data from
a dataset. A dossier is a list of datasets.

  - evaluate and apply attributes:

as_dataset(), as_dossier() is_dataset(), is_dossier()

  - Extract/transform meta data: data_extract(), dossier_create(),
    dataset_zap_data_dict(), dataset_cat_as_labels()

Functions to work with data types

These functions allow user to work with, extract or assign data type
(valueType) to values and/or dataset.

as_valueType(), is_valueType(), valueType_adjust(), valueType_guess(),
valueType_self_adjust(), valueType_of()

Unit tests and QA for datasets and data dictionaries

These helper functions evaluate content of a dataset and/or data
dictionary to extract from them irregularities or potential errors.
These informations are stored in a tibble that can be use to assess
inputs.

check_data_dict_categories(), check_data_dict_missing_categories(),
check_data_dict_taxonomy(), check_data_dict_variables(),
check_data_dict_valueType(), check_dataset_categories(),
check_dataset_valueType(), check_dataset_variables(),
check_name_standards()

Summarize information in dataset and data dictionaries

These helper functions evaluate content of a dataset and/or data
dictionary to extract from them summary statistics and elements such as
missing values, NA, category names, etc. These informations are stored
in a tibble that can be use to summary inputs.

dataset_preprocess(), summary_variables(),
summary_variables_categorical(),summary_variables_date(),
summary_variables_numeric(),summary_variables_text()

Write and read excel and csv

  - read_csv_any_formats() The csv file is read twice to detect the
    number of lines to use in attributing the column type (guess_max
    parameter of read_csv). This avoids common errors when reading csv
    files.

  - read_excel_allsheets() The Excel file is read and the values are
    placed in a list of tibbles, with each sheet in a separate element
    in the list. If the Excel file has only one sheet, the output is a
    single tibble.

  - write_excel_allsheets() Write all Excel sheets using
    xlsx::write.xlsx() recursively.

Plot and summary functions used in a visual report

plot_bar(), plot_box(), plot_date(), plot_density(), plot_histogram(),
plot_main_word(), plot_pie_valid_value(), summary_category(),
summary_numerical(),summary_text()

aggregate information and generate reports

  - assess data

data_dict_evaluate() dataset_evaluate() dossier_evaluate()

  - summarize data

dataset_summarize() dossier_summarize()

  - visualize data

dataset_visualize() variable_visualize() open_visual_report()