Changes in version 2025-06-15 Attention: Some changes to functions in the current version of madshapR may require updates of existing code. Superseded object. | previous version (1.1.0 and older) | version 2.0.0 | |------------------------------------|-------------------| | madshapR_DEMO | madshapR_examples | Superseded parameters. | previous version (1.1.0 and older) | current version (2.0.0) | |----|----| | dataset_evaluate(as_data_dict_mlstr) | dataset_evaluate(is_data_dict_mlstr) | | data_dict_evaluate(as_data_dict_mlstr) | data_dict_evaluate(is_data_dict_mlstr) | | dossier_evaluate(as_data_dict_mlstr) | dossier_evaluate(is_data_dict_mlstr) | Superseded function behaviors and/or output structures. In dataset_evaluate(), data_dict_evaluate() and dossier_evaluate(), the columns generated in the outputs have been renamed as follows : | previous version (1.1.0 and older) | current version (2.0.0) | |------------------------------------|-------------------------------| | index | Index | | name | Variable name | | label | Variable label | | valueType | Data dictionary valueType | | Categories::label | Categories in data dictionary | | Categories::missing | Non-valid categories | In dataset_summarize() and dossier_summarize(), the columns generated in the outputs have been renamed as follows : | previous version (1.1.0 and older) | current version (2.0.0) | |------------------------------------|----------------------------| | index in data dict.name | Index | | name | Variable name | | label | Variable label | | Estimated dataset valueType | Suggested valueType | | Actual dataset valueType | Dataset valueType | | Total number of observations | Number of rows | | Nb. distinct values | Number of distinct values | | Nb. valid values | Number of valid values | | Nb. non-valid values | Number of non-valid values | | Nb. NA | Number of empty values | | % total Valid values | % Valid values | | % Non-valid values | % Non-valid values | | % NA | % Empty values | | ———————————— | ——————————— | Bug fixes and improvements - The package now handles the valueType datetime, which formerly was considered either as a text or date. https://github.com/maelstrom-research/madshapR/issues/123 https://github.com/maelstrom-research/madshapR/issues/112 https://github.com/maelstrom-research/madshapR/issues/75 - The valueType object (present in columns in a data dictionary or as an attribute of a variable) had some errors and bugs that have been corrected. https://github.com/maelstrom-research/madshapR/issues/87 https://github.com/maelstrom-research/madshapR/issues/82 https://github.com/maelstrom-research/madshapR/issues/81 https://github.com/maelstrom-research/madshapR/issues/76 - When a column in a dataset is all NA (empty), the previous version had some issues that have been has been corrected. https://github.com/maelstrom-research/madshapR/issues/116 https://github.com/maelstrom-research/madshapR/issues/115 https://github.com/maelstrom-research/madshapR/issues/109 - A bug in data_dict_pivot_longer() when ‘Source’ or ‘Target’ column was not present has been corrected. https://github.com/maelstrom-research/madshapR/issues/86 - The SPSS format, which haven package uses to produce labelled variables, define integers different form madshapR, which ultimately . That has been taken in account and corrected. - The SPSS format in the haven package used to produce labelled variables defines integers differently from madshapR, which was causing errors. The difference has been taken into account. https://github.com/maelstrom-research/madshapR/issues/83 The group_by parameter has been redesigned. - dataset_preprocess() now handles grouped dataset, using parameter “group_by”. - Users can now define groups in summaries and visual reports using a variable that is not categorical or has empty values. https://github.com/maelstrom-research/madshapR/issues/47 - Previously, the “group_by” argument had some flaws, resulting in bugs that have been corrected. https://github.com/maelstrom-research/madshapR/issues/114 https://github.com/maelstrom-research/madshapR/issues/113 https://github.com/maelstrom-research/madshapR/issues/110 https://github.com/maelstrom-research/madshapR/issues/105 Enhancements in the assessment and summary reports! - The assessment and summary reports had some updates, such as renamed columns and bug corrections. https://github.com/maelstrom-research/madshapR/issues/126 https://github.com/maelstrom-research/madshapR/issues/104 https://github.com/maelstrom-research/madshapR/issues/98 https://github.com/maelstrom-research/madshapR/issues/97 https://github.com/maelstrom-research/madshapR/issues/96 https://github.com/maelstrom-research/madshapR/issues/95 https://github.com/maelstrom-research/madshapR/issues/94 https://github.com/maelstrom-research/madshapR/issues/93 https://github.com/maelstrom-research/madshapR/issues/92 https://github.com/maelstrom-research/madshapR/issues/91 https://github.com/maelstrom-research/madshapR/issues/90 https://github.com/maelstrom-research/madshapR/issues/89 https://github.com/maelstrom-research/madshapR/issues/88 https://github.com/maelstrom-research/madshapR/issues/85 https://github.com/maelstrom-research/madshapR/issues/80 https://github.com/maelstrom-research/madshapR/issues/79 Enhancements in the visual reports! - The visual reports have been improved, including better visual outputs and color palettes, and new features such as total number of rows next to the bar charts. https://github.com/maelstrom-research/madshapR/issues/108 https://github.com/maelstrom-research/madshapR/issues/107 https://github.com/maelstrom-research/madshapR/issues/106 https://github.com/maelstrom-research/madshapR/issues/100 https://github.com/maelstrom-research/madshapR/issues/84 https://github.com/maelstrom-research/madshapR/issues/64 New functions - typeof_convert_to_valueType() converts typeof (and class if any) into its corresponding valueType. - valueType_convert_to_typeof() converts valueType into its corresponding typeof and class in R representation. - data_dict_update() updates a data dictionary from a dataset. - data_dict_trim_labels() adds shortened labels to data dictionary. - first_label_get() gets the first label from a data dictionary. - has_categories() tests if a dataset has categorical variables. Changes in version 1.1.0 (2024-04-23) Bug fixes and improvements - for assessment, summary and visualization, the character columns in dataset are put to lower to avoid duplicated informations in outputs. https://github.com/maelstrom-research/madshapR/issues/63 - bug in the function variable_visualize() when the column was empty after removing internally stopwords. https://github.com/maelstrom-research/Rmonize/issues/53 https://github.com/maelstrom-research/Rmonize/issues/49 - Some elements were missing in the reports in dataset_evaluate() https://github.com/maelstrom-research/madshapR/issues/66 - Problem with names containing underscores in variables when visualized fixed. https://github.com/maelstrom-research/madshapR/issues/62 - Functions involving valueType (such as data_dict_apply(),valueType_guess() and valueType_adujst()) have been corrected to be more consistent in the usage of these functions. https://github.com/maelstrom-research/madshapR/issues/61 - The bug affecting tibbles which contain a variable named “test” has been corrected in the package fabR. https://github.com/maelstrom-research/madshapR/issues/60 - functions such as data_dict_summarize() and dataset_evaluate() have cells in tibble generated that can have more than accepted characters in a cell in Excel. the function truncates the cells in tibbles to a maximum of 10000 characters. https://github.com/maelstrom-research/madshapR/issues/59 - Problem with dataType in the function dataset_cat_as_labels() when the values found in the dataset are not in the data dictionary, and the valueType is text, and the dataType is “integer” has been fixed. https://github.com/maelstrom-research/madshapR/issues/58 - Functions involving date formatted variables have been corrected in the package fabR. https://github.com/maelstrom-research/madshapR/issues/57 - The inconsistent error in dataset_evaluate() has been corrected in the package fabR. https://github.com/maelstrom-research/madshapR/issues/46 deprecated functions To avoid confusion with help(function), the function madshapR_help() has been renamed madshapR_website(). Dependency changes - set a minimum dplyr dependence to avoid bugs Changes in version 1.0.3 (2023-12-19) Bug fixes and improvements Some of the tests were made with another package (Rmonize) which as “madshapR” as a dependence. Enhance reports - in visual reports, void confusing changes in color scheme in visual reports. - Histograms for date variables display valid ranges. - in reports, change % NA as proportion in reports. - dossier_visualize() report shows variable labels in the same lang. - in visual reports, the bar plot only appears when there are multiple missing value types, otherwise only the pie chart is shown. - in reports, all of the percentages are now included under “Other values (non categorical)”, which gives a single value. - https://github.com/maelstrom-research/madshapR/issues/51 suppress overwrite parameter in dataset_visualize(). - https://github.com/maelstrom-research/madshapR/issues/42 in dataset_summary() minor issue (consistency in column names and content). - correction of the function variable_visualize() when valueType_guess = TRUE Correct Data dictionary functions - https://github.com/maelstrom-research/madshapR/issues/50 enhance the function check_data_dict_valueType(), which was too slow. - https://github.com/maelstrom-research/madshapR/issues/49 valueType_adjust() now works with empty column (all NAs) - allow the format date to be transformed into text in dataset_zap_data_dict() when the format is unclear. New functions - col_id() function which is a short cut for calling the attribute madshapR::col_id of a dataset. - as_category(),is_category(),drop_category() function which coerces a vector as a categorical object. Typically a column in a dataset that needs to be coerced into a categorical variable (The data dictionary is updated accordingly). Deprecated functions - Rename and update example rda Object (in data) of DEMO_files into madshapR_DEMO for consistency across our other packages. Changes in version 1.0.2 (2023-10-09) Creation of NEWS feed !! Addition of NEWS.md for the development version use “(development version)”. Bug fixes and improvements - Some improvements in the documentation of the package has been made. - internal call of libraries (using ::) has been replaced by proper import in the declaration function. - get functions in fabR have been changed in its last release. the functions using them as dependencies ( check_xxx()) have been updated accordingly. - DEMO files no longer include harmonization files that are now in the package harmonizR Dependency changes New Imports: haven, lifecycle No longer in Imports: xfun New functions These functions are imported from fabR - bookdown_template() replaces the deprecated function bookdown_template(). - bookdown_render() which renders a Rmd collection of files into a docs/index.html website. - bookdown_open() Which allows to open a docs/index.html document when the bookdown is rendered This separation into 3 functions will allow future developments, such as render as a ppt or pdf. deprecated functions Due to another package development (see fabR), The function open_visual_report() has been deprecated in favor of bookdown_open() imported from fabR package. Changes in version 1.0.0 (2023-06-20) This package is a collection of wrapper functions used in data pipelines. This is still a work in progress, so please let us know if you used a function before and is not working any longer. Helper functions - madshapR_help() Call the help center for full documentation functions to generate, shape and format meta data. These functions allows to create, extract transform and apply meta data to a dataset. - Transform and shape: data_dict_collapse(),data_dict_expand(),data_dict_filter(), data_dict_group_by(),data_dict_group_split(),data_dict_list_nest(), data_dict_pivot_longer(),data_dict_pivot_wider(),data_dict_ungroup() - extract/apply meta data: data_dict_match_dataset(),data_dict_apply(), data_dict_extract() - evaluate and apply attributes: as_data_dict(), as_data_dict_mlstr(),as_data_dict_shape(), is_data_dict(), is_data_dict_mlstr(), is_data_dict_shape() as_taxonomy(), is_taxonomy() functions to generate, shape and format data. These functions allows to create, extract transform data/meta data from a dataset. A dossier is a list of datasets. - evaluate and apply attributes: as_dataset(), as_dossier() is_dataset(), is_dossier() - Extract/transform meta data: data_extract(), dossier_create(), dataset_zap_data_dict(), dataset_cat_as_labels() Functions to work with data types These functions allow user to work with, extract or assign data type (valueType) to values and/or dataset. as_valueType(), is_valueType(), valueType_adjust(), valueType_guess(), valueType_self_adjust(), valueType_of() Unit tests and QA for datasets and data dictionaries These helper functions evaluate content of a dataset and/or data dictionary to extract from them irregularities or potential errors. These informations are stored in a tibble that can be use to assess inputs. check_data_dict_categories(), check_data_dict_missing_categories(), check_data_dict_taxonomy(), check_data_dict_variables(), check_data_dict_valueType(), check_dataset_categories(), check_dataset_valueType(), check_dataset_variables(), check_name_standards() Summarize information in dataset and data dictionaries These helper functions evaluate content of a dataset and/or data dictionary to extract from them summary statistics and elements such as missing values, NA, category names, etc. These informations are stored in a tibble that can be use to summary inputs. dataset_preprocess(), summary_variables(), summary_variables_categorical(),summary_variables_date(), summary_variables_numeric(),summary_variables_text() Write and read excel and csv - read_csv_any_formats() The csv file is read twice to detect the number of lines to use in attributing the column type (guess_max parameter of read_csv). This avoids common errors when reading csv files. - read_excel_allsheets() The Excel file is read and the values are placed in a list of tibbles, with each sheet in a separate element in the list. If the Excel file has only one sheet, the output is a single tibble. - write_excel_allsheets() Write all Excel sheets using xlsx::write.xlsx() recursively. Plot and summary functions used in a visual report plot_bar(), plot_box(), plot_date(), plot_density(), plot_histogram(), plot_main_word(), plot_pie_valid_value(), summary_category(), summary_numerical(),summary_text() aggregate information and generate reports - assess data data_dict_evaluate() dataset_evaluate() dossier_evaluate() - summarize data dataset_summarize() dossier_summarize() - visualize data dataset_visualize() variable_visualize() open_visual_report()