Dplyr summarize count if

11/5/2023

I could choose 3 for Greens and 5 for Purples, for example. vtree’s palette argument uses palette numbers, not names you can see how they’re numbered in the vtree package documentation. I’m not keen on the color defaults here, but you can swap in an RColorBrewer palette. Gets you this basic response: Sharon Machlis, IDG The vtree package generates graphics for crosstabs as opposed to graphs. Running the main vtree() function on one variable, such as library(vtree) vtree(mydata, "LanguageGroup") You can see more details in the function’s help file. There are also options familiar to ggplot2 users, such as ggtheme and palette. PlotXTabs2() has a couple of dozen argument options, including title, caption, legends, color scheme, and one of four plot types: side, stack, mosaic, or percent. If you don’t need or want those summaries, you can remove them with results.subtitle = FALSE, such as PlotXTabs2(mydata, LanguageGroup, Gender, results.subtitle = FALSE). PlotXTabs2(mydata) creates a graph with a different look, and some statistical summaries (second graph at left). This code returns bar graphs of the data (first graph below): library(CGPfunctions) PlotXTabs(mydata) Screen shot by Sharon Machlis, IDG The package has two functions of interest for examining crosstabs: PlotXTabs() and PlotXTabs2(). Install it from CRAN with the usual install.packages("CGPfunctions"). The CGPfunctions package is worth a look for some quick and easy ways to visualize crosstab data. This code returns a list with one data frame for each third-level choice: $No However, it gets a little harder to visually compare results in more than two levels this way. tabyl(mydata, Gender, LanguageGroup, Hobbyist) %>% adorn_percentages("col") %>% adorn_pct_formatting(digits = 1) If you want to add a third variable, such as Hobbyist, that’s easy too. To see percents by row, add adorn_percentages("row"). tabyl(mydata, Gender, LanguageGroup) %>% adorn_percentages("col") %>% adorn_pct_formatting(digits = 1) Gender Both Neither Python R You can then pipe those results into a formatting function such as adorn_pct_formatting().

If you want to see percents for each column instead of raw totals, add adorn_percentages("col"). What’s nice about tabyl() is it’s very easy to generate percents, too. The first column name you add to a tabyl() argument becomes the row, and the second one the column. The basic tabyl() function returns a data frame with counts.

So, what’s the gender breakdown within each language group? For this type of reporting in a data frame, one of my go-to tools is the janitor package’s tabyl() function. I filtered the raw data to make the crosstabs more manageable, including removing missing values and taking the two largest genders only, Man and Woman.

$ LanguageGroup : chr "Python" "Python" "Neither" "Python". $ LanguageWorkedWith: chr "HTML/CSS Java JavaScript Python" "C++ HTML/CSS Python" "HTML/CSS" "C C++ C# Python SQL". The data has one row for each survey response, and the four columns are all characters. If you’d like to follow along, the last page of this article has instructions on how to download and wrangle the data to get the same data set I’m using.

0 Comments

Dplyr summarize count if

Leave a Reply.

Author

Archives

Categories