المشاركات

عرض المشاركات من فبراير, 2024

Knowing Something vs. Knowing the Name of Something: Some Points about Causal Analysis

صورة
The famed physicist Richard Feynman once said, “I learned very early the difference between knowing the name of something and knowing something,” a lesson from his father. I think too often we in the statistics/machine learning field are guilty of “only knowing the name of something.” Well, in most ... Continue reading: Knowing Something vs. Knowing the Name of Something: Some Points about Causal Analysis http://dlvr.it/T3SR99

Navigating the Enchanted Forest of Data: A Tale of Reproducible Research with RMarkdown

صورة
Gathering Your Gear: Preparing for the Quest with RMarkdownStanding at the threshold of the Enchanted Forest of Data, you’re about to embark on a journey where insight and clarity emerge from the shadows of complexity. Your guide through this mystical realm is RMarkdown, a powerful tool that weaves the ... Continue reading: Navigating the Enchanted Forest of Data: A Tale of Reproducible Research with RMarkdown http://dlvr.it/T3SR1d

Unlocking Efficiency: How to Set a Data Frame Column as Index in R

Introduction In the realm of data manipulation and analysis, efficiency is paramount. One powerful technique to enhance your workflow is setting a column in a data frame as the index. This seemingly simple task can unlock a plethora of benefits,... Continue reading: Unlocking Efficiency: How to Set a Data Frame Column as Index in R http://dlvr.it/T3Rjzt

Reproducible data science with Nix, part 10 — contributing to nixpkgs

صورة
I’ve very recently started contributing to the nixpkgs repository of packages, which contains all the packages you can install from the Nix package manager. My contributions are fairly modest: I help fix R packages that need some tweaking to make them successfully build for Nix. Most of these fixes ... Continue reading: Reproducible data science with Nix, part 10 — contributing to nixpkgs http://dlvr.it/T3RjZb

Escape the Spreadsheet Inferno: Switch to Shiny for Clinical Trial Reporting

صورة
Data management in the pharmaceutical industry presents unique challenges, often compounded by the sheer volume and complexity of clinical trial data. Traditional methods, particularly spreadsheet-based approaches, have long been the norm, yet they pose significant drawbacks. Interested in advancing clinical research with dynamic reports? Uncover how Shiny and Quarto can ... Continue reading: Escape the Spreadsheet Inferno: Switch to Shiny for Clinical Trial Reporting http://dlvr.it/T3Q7tY

What Good is Analysis of Variance?

صورة
Introduction I’d like to demonstrate what “analysis of variance” (often abbreviated as “anova” or “aov”) does for you as a data scientist or analyst. After reading this note you should be able to determine how an analysis of variance style calculation can or can not help with your project. (... Continue reading: What Good is Analysis of Variance? http://dlvr.it/T3PH2z

The R Consortium 2023: A Year of Growth and Innovation

Excerpted from the Annual Report Access the annual report here! Letter from the Chair — Mehar Pratap Singh, Chairman Welcome to the 2023 Annual Report of the R Consortium. This... The post The R Consortium 2023: A Year of Growth and Innovation appeared first on R Consortium. Continue reading: The R Consortium 2023: A Year of Growth and Innovation http://dlvr.it/T3NkPW

Key advantages of using the keyring package

Does your package need the user to provide secrets, like API tokens, to work? Have you considered telling your package users about the keyring package, or even forcing them to use it? The keyring package maintained by Gábor Csárdi is a package that acc... Continue reading: Key advantages of using the keyring package http://dlvr.it/T3Nk5c

Homicide Rates from Gender Perspective: Analysis using Radar Chart and Bootstrap Intervals

صورة
The violence in the regions is essential to indicate the peace and security reached by the countries. Fortunately, the global homicide rate has been decreasing while it is slowly. But as for men, the situation does not look so bright. The global homicide rate per 100.000 people is about four times ... Continue reading: Homicide Rates from Gender Perspective: Analysis using Radar Chart and Bootstrap Intervals http://dlvr.it/T3MvYZ

Demystifying the melt() Function in R

Introduction The melt() function in the data.table package is an extremely useful tool for reshaping datasets in R. However, for beginners, understanding how to use melt() can be tricky. In this post, I’ll walk through several examples to demons... Continue reading: Demystifying the melt() Function in R http://dlvr.it/T3L0r9

PowerQuery Puzzle solved with R

صورة
#159–160PuzzlesAuthor: ExcelBIAll files (xlsx with puzzle and R with solution) for each and every puzzle are available on my Github. Enjoy.Puzzle #159Today we have data about some salespeople. But they have only data about the months when they exceeded... Continue reading: PowerQuery Puzzle solved with R http://dlvr.it/T3JnKb

R Solution for Excel Puzzles

صورة
Puzzles no. 394–398PuzzlesAuthor: ExcelBIAll files (xlsx with puzzle and R with solution) for each and every puzzle are available on my Github. Enjoy.Puzzle #394As you probably already noticed challenges with palindromes are pretty common here. But usu... Continue reading: R Solution for Excel Puzzles http://dlvr.it/T3Jn36

R For SEO Part 5: Common Excel Formulas In R

صورة
R For SEO Part 5: Common Excel Formulas In R Welcome back. It’s part 5 of my R for SEO series and I hope you’re all finding it useful so far. Up to now, we’ve covered the basics, using packages and Google Analytics & Search Console, data visualisation with GGPlot2 ... Continue reading: R For SEO Part 5: Common Excel Formulas In R http://dlvr.it/T3J78M

Pairwise comparisons in nonlinear regression

Pairwise comparisons are one of the most debated topic in agricultural research: they are very often used and, sometimes, abused, in literature. I have nothing against the appropriate use of this very useful technique and, for those who are interest... Continue reading: Pairwise comparisons in nonlinear regression http://dlvr.it/T3Hw0C

Level Up Your R/Shiny Skills with Appsilon’s Tailored Workshops

صورة
R/Shiny has established itself as a cornerstone technology for creating interactive, data-driven web applications. Recognising the potential challenges and opportunities this presents, Appsilon offers an exclusive series of workshops designed to enhance the skills of R/Shiny teams, from beginners to advanced developers in the life sciences. Why R/... Continue reading: Level Up Your R/Shiny Skills with Appsilon’s Tailored Workshops http://dlvr.it/T2kfq0

Ann Arbor R User Group: Harnessing the Power of R and GitHub

صورة
The R Consortium talked to Barry Decicco, founder, and organizer of the Ann Arbor R User Group, based in Ann Arbor, Michigan. Barry shared his experience working with R as... The post Ann Arbor R User Group: Harnessing the Power of R and GitHub appeared first on R Consortium. Continue reading: Ann Arbor R User Group: Harnessing the Power of R and GitHub http://dlvr.it/T2jWd5

PowerQuery Puzzle solved with R

صورة
#153–156PuzzlesAuthor: ExcelBIAll files (xlsx with puzzle and R with solution) for each and every puzzle are available on my Github. Enjoy.Similarly to the Excel Puzzles we have doubled episode today because of my winter holidays. Do not worry. Puzzles... Continue reading: PowerQuery Puzzle solved with R http://dlvr.it/T2jWTJ

Be kind don’t rbind

The other day I was helping to refactor an R package and came across one of the biggest performance blockers there is: dynamically growing matrices. Of course I repeated the mantra “Always preallocate your variables” but in this case, it is not ... Continue reading: Be kind don’t rbind http://dlvr.it/T2hxcS

Unraveling the term “Validation”: Join the Discussion at the R Validation Hub Community Meeting on February 20, 2024 

صورة
Dive into the world of validation at the first R Validation Hub community meeting of the year! What defines a validated R package? Is it ensuring reproducibility across systems? Prioritizing... The post Unraveling the term “Validation”: Join the Discussion at the R Validation Hub Community Meeting on February 20, 2024  appeared first ... Continue reading: Unraveling the term “Validation”: Join the Discussion at the R Validation Hub Community Meeting on February 20, 2024  http://dlvr.it/T2gB9G

Du Bois Visualization Challenge

صورة
Slave and Free Negroes – W.E.B Du Bois Recreating the the data visualization of W.E.B Du Bois from the 1900 Paris Exposition using modern tools. https://github.com/ajstarks/dubois-data-portraits/tree/master/challenge/2024 /> Config librar... Continue reading: Du Bois Visualization Challenge http://dlvr.it/T2fmM0

Emacs as IDE for R

صورة
Recently I have seen many posts about which IDE for R people prefer, with minimalists list of options, usually of size 2: R Studio and VS Code. I guess that some people forget, or many don't even know about two of the most powerful text editors that have been helping developers ... Continue reading: Emacs as IDE for R http://dlvr.it/T2fm8s

Favorite apps on my Mac

صورة
Will do a separate post on command line setup and tools. VSCode is in the list and I’ll do a separate post on the extensions I have installed. For each application, I have indicated the price with one of the following icons1 – , , – indicating &... Continue reading: Favorite apps on my Mac http://dlvr.it/T2fRCj

Aggregating Measures of Uncertainty

There are many situations where you want to aggregate values, however if those values are on different scales or are related to measures of uncertainty, it’s typically more complicated than simply taking a simple mean or sum. As an example, say your... Continue reading: Aggregating Measures of Uncertainty http://dlvr.it/T2fQxD

Baking the cake dataset cake

صورة
Now that I’ve got my hands on the source of the cake dataset I knew I had to attempt to bake the cake too. Here, the emphasis is on attempt, as there’s no way I would be able to actually replicate the elaborate and cake-scientifically rigorous re... Continue reading: Baking the cake dataset cake http://dlvr.it/T2cm0x

Taylor Swift and Data Analysis

صورة
Taylor Swift and Data Analysis. by Jerry Tuttle Who will be the most talked-about celebrity before, during, and after the Super Bowl? Continue reading: Taylor Swift and Data Analysis http://dlvr.it/T2cQ8V

Unveiling Roman Amphitheaters with a ggplot2 violin plot

صورة
1. What is a violin plot? A violin plot is a mirrored density plot that is rotated 90 degrees as shown in the picture. It depicts the distribution of numeric data. 2. When should you use a violin plot? A violin plot is useful to compare the ... Continue reading: Unveiling Roman Amphitheaters with a ggplot2 violin plot http://dlvr.it/T2cQ1W

Reproducible and Reliable Shiny Apps for Regulatory Submissions

صورة
Shiny Apps have emerged as a powerful tool, transforming data analysis and visualization. These interactive web applications are crafted using the R programming language, and no knowledge of HTML, CSS, or JavaScript is required. When it comes to industries governed by stringent regulations, such as pharmaceuticals and healthcare, an important ... Continue reading: Reproducible and Reliable Shiny Apps for Regulatory Submissions http://dlvr.it/T2XDnQ

Navigating the Bayesian Landscape: From Concepts to Application

صورة
In the diverse universe of statistical analysis, Bayesian statistics stands as a beacon of a distinct approach to understanding and interpreting the world through data. Unlike the classical, or frequentist, approach to statistics that many are familiar... Continue reading: Navigating the Bayesian Landscape: From Concepts to Application http://dlvr.it/T2VtLv

Tweedie regression, or Poisson-Gamma regressions ?

صورة
Yesterday, I was chating with a young and enthousiastic actuary, who asked a nice (and classical) question: is it the same, or not to use a Tweedie regression, or two regressions (Poisson, and Gamma). For distributions, the two are equivalent, but when we have heterogeneity and explanatory variable, I really ... Continue reading: Tweedie regression, or Poisson-Gamma regressions ? http://dlvr.it/T2VtDH

How to Check if a Column is a Date in R: A Comprehensive Guide with Examples

Introduction As an R programmer, you may often encounter datasets where you need to determine whether a column contains date values. This task is crucial for data cleaning, manipulation, and analysis. In this blog post, we’ll explore various met... Continue reading: How to Check if a Column is a Date in R: A Comprehensive Guide with Examples http://dlvr.it/T2Vt7C

Reading notes on Producing open source software by Karl Fogel (First edition)

I recently re-read Nadia Eghbal’s Working in public. This time around, I noticed her mention of the book “Producing open source software” by Karl Fogel. It is a book about the people aspect of open-source projects, including money, an... Continue reading: Reading notes on Producing open source software by Karl Fogel (First edition) http://dlvr.it/T2Vsz9

Improving with R: Kylie Bemis Unveils Enhanced Signal Processing with Matter 2.4 Upgrade

صورة
The R Consortium recently connected with Kylie Bemis, assistant teaching professor at the Khoury College of Computer Sciences at Northeastern University. She has a keen interest in statistical computing frameworks... The post Improving with R: Kylie Bemis Unveils Enhanced Signal Processing with Matter 2.4 Upgrade appeared first on R Consortium. Continue reading: Improving with R: Kylie Bemis Unveils Enhanced Signal Processing with Matter 2.4 Upgrade http://dlvr.it/T2SThd

Testing Containers and WebAssembly in Submissions to the FDA

صورة
The R Consortium Submission Working Group has now successfully made two pilot submissions to the FDA. All the submissions done by the group are focused on improving practices for R-based clinical trial regulatory submissions. Now, the R submission Working Groups, in collaboration with Appsilon and Posit, are exploring new technologies ... Continue reading: Testing Containers and WebAssembly in Submissions to the FDA http://dlvr.it/T2S8PP

How to Check if Date is Between Two Dates in R

Introduction Hello fellow R enthusiasts! Today, we’re diving into a common task in data analysis and manipulation: checking if a date falls between two given dates. Whether you’re working with time-series data, financial data, or any other type ... Continue reading: How to Check if Date is Between Two Dates in R http://dlvr.it/T2S8BX

Level Up Your R/Shiny Team Skills – Download Ebook

صورة
Are you ready to take your R/Shiny skills to the next level? Whether you’re a seasoned developer or just starting your journey, we have the perfect opportunity for you to enhance your team’s collaboration, efficiency, and overall effectiveness in building stunning Shiny applications. Join us for our ... Continue reading: Level Up Your R/Shiny Team Skills – Download Ebook http://dlvr.it/T2PYS5

Please Shut Up! Verbosity Control in Packages

We recently introduced a new paragraph to the development version of our dev guide Provide a way for users to opt out of verbosity, preferably at the package level: make message creation dependent on an environment variable or option (like “use... Continue reading: Please Shut Up! Verbosity Control in Packages http://dlvr.it/T2PFQD

new programming with data.table

The newest version of data.table has hit CRAN, and there are lots of great new features. Among them, a %notin% function, a new let function that can be used instead of := ( I wasn’t too fussed about this originally but have tried it a few ti... Continue reading: new programming with data.table http://dlvr.it/T2PF63

new programming with data.table

baby steps creating handy functions with the new data.table programming interface Continue reading: new programming with data.table http://dlvr.it/T2MCBL

Getting marine polygon maps in R

صورة
Another frequent question of my students is how to obtain a polygon map of the seas and oceans, rather than the land polygons (countries, etc.) that are commonly imported with R spatial data packages. You can mostly just use the … Continue reading → Continue reading: Getting marine polygon maps in R http://dlvr.it/T2M0MK

Optimize your images with R and reSmush.it

صورة
Compress the size of your images with R, resmush and reSmush.it Continue reading: Optimize your images with R and reSmush.it http://dlvr.it/T2M087

Taming Excel Dates in R: From Numbers to Meaningful Dates!

Introduction Have you ever battled with Excel’s quirky date formats in your R projects? If so, you’re not alone! Those cryptic numbers can be a real headache, but fear not, fellow R warriors! Today, we’ll conquer this challenge and transform tho... Continue reading: Taming Excel Dates in R: From Numbers to Meaningful Dates! http://dlvr.it/T2LMm8

Version 1.1.0 of NIMBLE released

We’ve released the newest version of NIMBLE on CRAN and on our website. NIMBLE is a system for building and sharing analysis methods for statistical models, especially for hierarchical models and computationally-intensive methods (such as MCMC,Laplace approximation, and SMC). This release provides new functionality as well as various ... Continue reading: Version 1.1.0 of NIMBLE released http://dlvr.it/T2JfGk

more .I in data.table

Following on from my last post, here is a bit more about the use of .I in data.table. Scenario : you want to obtain either the first, or last row, from a set of rows that belong to a particular group. For example, for a patient admitted to ... Continue reading: more .I in data.table http://dlvr.it/T2Jf77

Natalia Andriychuk on RUGs, Pfizer R Center of Excellence, and Open Source Projects: Fostering R Communities Inside and Out

صورة
The R Consortium recently talked with Natalia Andriychuk, Statistical Data Scientist at Pfizer and co-founder of the RTP R User Group (Research Triangle Park in Raleigh, North Carolina), to get... The post Natalia Andriychuk on RUGs, Pfizer R Center of Excellence, and Open Source Projects: Fostering R Communities Inside and ... Continue reading: Natalia Andriychuk on RUGs, Pfizer R Center of Excellence, and Open Source Projects: Fostering R Communities Inside and Out http://dlvr.it/T2Gltr

Big Book of R at 400 [New milestone!]

Drumroll please……………….!!!!! With the addition of these 7 new books, the collection now stands at over 400 entries of (mostly) free R books! Many thanks to Markus Gesmann, Jacobus, Max Cotera, Luis, Olivier Leroy and Gary for their latest contributions. This a truly a one-of-a-kind resource. I want to give … The post ... Continue reading: Big Book of R at 400 [New milestone!] http://dlvr.it/T2GlkG

Object Oriented Programming in R Part 2: S3 Simplified

صورة
In the previous article we made our first steps in Object Oriented Programming in R and learned that there are multiple ways of doing it. In this article, we will dive deeper into the S3 system – the first object-oriented system in R. Fun fact: if you have used R, you ... Continue reading: Object Oriented Programming in R Part 2: S3 Simplified http://dlvr.it/T2GlXc

Accounts Recievables Pathways in SQL

Yesterday I was working on a project that required me to create a SQL query to generate a table of accounts receivables pathways. I thought it would be interesting to share the SQL code I wrote for this task. The code is as follows: -- Create the... Continue reading: Accounts Recievables Pathways in SQL http://dlvr.it/T2GlKL

Reproducible data science with Nix, part 9 — rix is looking for testers!

صورة
After 5 months of work, Philipp Baumann and myself are happy to announce that our package, {rix} is getting quite close to being in a state we consider “done” (well, at least, for a first release). We plan on submit it first to rOpenSci for review, and later to CRAN. But ... Continue reading: Reproducible data science with Nix, part 9 — rix is looking for testers! http://dlvr.it/T2Gl47

simulating signed mixtures

صورة
While simulating from a mixture of standard densities is relatively straightforward, when the component densities are easily simulated, to the point that many simulation methods exploit an intermediary mixture construction to speed up the production of pseudo-random samples from more challenging distributions (see Devroye, 1986), things get surprisingly more complicated when ... Continue reading: simulating signed mixtures http://dlvr.it/T2C6qT

Shadow and Substance: Unveiling the Twin Mysteries of Correlation and Covariance

صورة
In the grand tapestry of statistical analysis, the threads of correlation and covariance weave a complex narrative, telling stories hidden within data. Like twin stars in a vast galaxy of numbers, these concepts illuminate the relationships and pattern... Continue reading: Shadow and Substance: Unveiling the Twin Mysteries of Correlation and Covariance http://dlvr.it/T2C6bw

R for the Real World: Counting those Business Days like a Pro!

Introduction Hi fellow coders, data wranglers, and all-around R enthusiasts! Have you ever been stuck calculating the number of business days between two dates? You know, like figuring out how long that project actually took, excluding weekends ... Continue reading: R for the Real World: Counting those Business Days like a Pro! http://dlvr.it/T2C6QD

LinkedIn Releases Their Report on the Top 25 Hottest Jobs in the US in 2024

Looking for a career change in 2024? For those working as Data Scientists, we have breaking news: you are all AI Engineers now. LinkedIn recently released their report on the top 25 fastest growing jobs in the US. 📈 LinkedIn’s Editor-In-Chief even did an interview about it on the Today Show. In ... Continue reading: LinkedIn Releases Their Report on the Top 25 Hottest Jobs in the US in 2024 http://dlvr.it/T2C68w