Widget HTML Atas

R For Business Analytics Pdf Free Download

R for Business Analytics looks at some of the most common tasks performed by business analysts and helps the user navigate the wealth of information in R and its 4000 packages. With this information the reader can select the packages that can help process the analytical tasks with minimum effort and maximum usefulness. The use of Graphical User Interfaces (GUI) is emphasized in this book to further cut down and bend the famous learning curve in learning R. This book is aimed to help you kick-start with analytics including chapters on data visualization, code examples on web analytics and social media analytics, clustering, regression models, text mining, data mining models and forecasting. The book tries to expose the reader to a breadth of business analytics topics without burying the user in needless depth. The included references and links allow the reader to pursue business analytics topics. This book is aimed at business analysts with basic programming skills for using R for Business Analytics. Note the scope of the book is neither statistical theory nor graduate level research for statistics, but rather it is for business analytics practitioners. Business analytics (BA) refers to the field of exploration and investigation of data generated by businesses. Business Intelligence (BI) is the seamless dissemination of information through the organization, which primarily involves business metrics both past and current for the use of decision support in businesses. Data Mining (DM) is the process of discovering new patterns from large data using algorithms and statistical methods. To differentiate between the three, BI is mostly current reports, BA is models to predict and strategize and DM matches patterns in big data. The R statistical software is the fastest growing analytics platform in the world, and is established in both academia and corporations for robustness, reliability and accuracy. The book utilizes Albert Einstein's famous remarks on making things as simple as possible, but no simpler. This book will blow the last remaining doubts in your mind about using R in your business environment. Even non-technical users will enjoy the easy-to-use examples. The interviews with creators and corporate users of R make the book very readable. The author firmly believes Isaac Asimov was a better writer in spreading science than any textbook or journal author. © Springer Science+Business Media New York 2012. All rights are reserved.

  • A. Ohri

In this chapter we introduce the reader to R, discuss reasons for choosing R as an analytical and not just a statistical computing platform, make comparisons with other analytical software, and present some broad costs and benefits in using R in a business environment.

  • A. Ohri

In this chapter we discuss the practical realities in setting up an analytical environment based on R, including hardware, software, budgeting, and training needs. We will also walk through the basics of installing R, R's library of packages, updating R, and accessing the comprehensive user help.

  • A. Ohri

In this chapter we discuss the various ways to interface R and to use R analytics based on one's needs. We will cover how to minimize the time spent learning to perform tasks in R by using a GUI instead of the command line. In addition, we will learn how to interface to R from other software as well as use it from an Amazon cloud computing environment. We will also discuss the relative merits and demerits of various R interfaces.

  • A. Ohri

R has different types of data storage such as lists, arrays, and data frames. This can be confusing for some analysts with a pure background in handling rectangular datasets like data (with rows for records and variables for columns). The first and often the toughest or most time-consuming task in an analytical environment for a new project is getting the data loaded into the analytical software. This chapter discusses the techniques for reading in data from various formats. The two main methods of inputting data are through the command line and a GUI, and different packages for bigger datasets (¿1 GB) are discussed. In addition, obtaining data from various types of databases is specifically mentioned. Analyzing data can have many challenges associated with it. In the case of business analytics data, these challenges or constraints can have a marked effect on the quality and timeliness of the analysis as well as the expected versus actual payoff from the analytical results.

  • A. Ohri

While Chap. 4 dealt with getting your data in shape for processing (or, as it is commonly known, data preprocessing), in this chapter we actually start the process of looking at slices of data for generating various insights. We will emphasize the need for data visualization both as an acknowledgement of growing demands of data volume and easy understandability by business audiences. The fact that R currently has one of the most advanced graphical libraries also helps. We will be using basic graphical capabilities but will also briefly touch on advanced customization using the acclaimed ggplot2 package.

  • A. Ohri

One of the most common uses of statistical software is for building models, specifically logistic regression models for propensity in the marketing of goods and services. Within the R Project, regression packages are shown in the documentation in both the Econometrics view—http:// cran. cnr. berkeley. edu/ web/ views/ Econometrics. html—and the Finance view. A basic summary of all the R functions used for building regression models can be seen at http:// cran. r-project. org/ doc/ contrib/ Ricci-refcard-regression. pdf. A very good textbook on the basics of regression is Practical Regression and Anova Using R by Julian J. Faraway (available for free at http:// cran. r-project. org/ doc/ contrib/ Faraway-PRA. pdf).

  • A. Ohri

Data mining is a commonly used term that is interchangeably used with business analytics, but it is not exactly the same.

  • A. Ohri

Cluster analysis is basically a data reduction technique to reduce a large number of objects in groups or clusters in such a manner that objects belonging to one group or cluster are more similar to each other and more different from objects in another group or cluster. Clustering is used in business analytics to identify groups of customers that can be targeted with similar products, to understand products and markets, and basically to reduce data for an actionable strategy especially in cases where data are not sufficiently clean or exhaustive to create predictive models.

  • A. Ohri

Time series are series in which some quantity or variable varies with respect to time intervals (in the form of months, weeks, days, hours, etc.). This basically implies that the future value of a particular variable is in some way related to its present value as well as to the time interval difference.

  • A. Ohri

Data export, and saving results, graphs, and code are important to help complete the final documentation and presentation for an analytical project. What are the various formats available in R for exporting graphs? The function capabilities() can be used to obtain a list of exportable formats for graphs.

  • A. Ohri

As the previous chapters have shown, multiple techniques are available in R for powerful data-driven insights and analysis. For the average business analyst, well-designed GUID tools that are stable to use, pull data and models, and report them are essential, and all these are available within various R subcomponents and packages. This chapter is aimed at analysts wishing to tweak their overall R experience by measuring R performance and improving it using some of the well-known and some recently introduced utilities.

  • A. Ohri

Blogs, email help groups, and Web sites are important sources of training literature as well as tutorials. While choosing the mix of books, journal articles, blog posts, and online content is often a matter a personal choice, the reader should choose based on his or her own business or analytical needs.

  • A. Ohri

Google Analytics is the most widely used Web analytics software on the Internet, and using R we can do advanced analytics or build a custom Web analytics solution with it.

... De cada grupo se eligieron los diez documentos principales para ser analizados (tabla 6), el criterio para seleccionarlos fue su PageRank (Page et al., 1999), el cual es un indicador que permite seleccionar desde un punto de vista cuantitativo los mejores trabajos dentro de un grupo, basado en indicadores de citación (Ding et al., 2009;Yan et al., 2010). Finalmente, como elemento de apoyo para identificar las temáticas de cada cluster se utilizó minería de texto programada en R a través del paquete WordCloud (Ohri, 2012), con el cual se generan nubes de palabras a partir de los títulos y palabras claves de todos los documentos que integran los clústeres. En la tabla 6 se relacionan los clústeres con sus respectivas temáticas. ...

El rápido deterioro ambiental, la creciente desigualdad social y las frecuentes crisis económicas, han hecho que la sociedad y en especial las empresas, cambien sus comportamientos y se vuelvan cada vez más conscientes de su papel en la temática de sostenibilidad, todo esto, sin dejar de lado sus intereses económicos. Debido al creciente interés académico y profesional sobre estos temas, esta investigación pretende realizar una revisión de literatura basada en mapeo científico, sobre el vínculo entre finanzas corporativas y sostenibilidad. Para ello, se realizó una búsqueda de la producción científica registrada en WoS y Scopus en las dos últimas décadas, poste-riormente, con herramientas bibliométricas y análisis de red, se expone la estructura actual de conocimiento del tema y se identifican tres corrientes emergentes de investigación: sostenibilidad y rendimiento financiero, resultados financieros de la responsabilidad social corporativa, informes y reportes de sostenibilidad. Finalmente, se presenta una agenda para futuras investigaciones.

... Una vez seleccionados los artículos de mayor interés, se presenta un análisis objetivo del tema y sus perspectivas. Las perspectivas son determinadas mediante el análisis de co-citaciones mediante el algoritmo de clusterización (Blondel et al., 2008) proceso en el que se identifican y analizan los temas que las componen mediante la minería de texto a través de "R", específicamente con el algoritmo Wordcloud (Ohri, 2013;Robledo-Giraldo et al., 2013). ...

La neuroeconomía es un campo multidisciplinar, que articula los conocimientos de áreas como la Economía, la psicología y la neurociencia, y que estudia el comportamiento cerebral en la toma de decisiones. A través de una revisión de literatura, se presenta la evolución de la investigación en neuroeconomía. Para ello, se emplean técnicas de mapeo científico, apoyadas en herramientas bibliométricas. La búsqueda, se realizó en las bases de datos WoS y Scopus, y la información obtenida fue procesada con las herramientas Bibliometrix y Gephi. Los documentos se clasificaron según su relevancia, en tres categorías: clásicos, estructurales y actuales. Luego, a través de un análisis de co-citaciones y clusterización, se identificaron y analizaron cinco líneas o corrientes de investigación en el área, a saber: elecciones económicas, elección social, consideraciones sobre la neuroeconomía, neurociencia del consumidor y comportamiento y estímulo cerebral. Se concluye con la necesidad de encontrar y estandarizar una metodología de investigación, en la que converjan los criterios para fortalecer los resultados de las investigaciones realizadas, ya que no se pueden desconocer las limitaciones de las metodologías actuales.

... Para determinar las perspectivas de RSC y emprendimiento se realizó un análisis de co-citaciones, empleando el algoritmo de clusterización (Blondel et al. 2008). Después, se determinaron y analizaron los aspectos que componen cada perspectiva utilizando minería de texto y webscrafing a través del paquete Rstudio, exactamente empleando el algoritmo Wordcloud (Ohri, 2012). Se identificaron y eligieron las tres áreas o perspectivas más grandes de la red, que, en conjunto, suman un 43% de los documentos de la Viviana Ramos Enríquez, Pedro Duque, Jaime Andrés Vieira Salazar red. ...

La Responsabilidad Social Corporativa y el emprendimiento son temas de gran amplitud e importancia teórica y práctica, los cuales, han sido abordados desde diversas perspectivas y aplicables en diferentes escenarios. Aunque el interés por ambos temas se ha incrementado en los últimos años, no se ha evidenciado una revisión bibliográfica que muestre su evolución y sus tendencias mediante un análisis de citaciones. Por tanto, el presente artículo tuvo como propósito identificar las tendencias de investigación emergentes en estos temas, cubriendo la literatura de Web of Science y Scopus mediante el mapeo científico. Para esto, se desarrolló un análisis bibliométrico de la producción científica; y se utilizó la analogía del árbol para clasificar los documentos principales. Los resultados reflejaron que la Responsabilidad Social Corporativa es aplicable a cualquier tipo de organización, sin embargo, en empresas pequeñas y emprendimientos se ve limitada debido a presiones de grupos de interés. En el análisis de citaciones se evidenciaron tres perspectivas o corrientes de investigación: desempeño y sostenibilidad; cambios institucionales y organizacionales; valor compartido y emprendimiento social. En términos prácticos, los gerentes y emprendedores pueden implementar estrategias innovadoras y sostenibles bajo modelos de responsabilidad social integradores.

... Para generar las perspectivas de la rsc y la gobernanza se utilizó el algoritmo de clusterización (Blondel et al., 2008), propuesto por Buitrago et al. (2019). Por último, se identificaron y analizaron los temas que componen cada perspectiva, por medio de minería de texto a través del paquete R, empleando el algoritmo Wordcloud (Ohri, 2012). ...

La responsabilidad social corporativa (RSC) y su relación con la gobernanza es una temática de creciente importancia, dada la responsabilidad de las corporaciones de contribuir con las problemáticas sociales y ambientales; si bien la presentación de informes de RSC es una práctica cada vez más común en las organizaciones, aún a nivel investigativo hay brechas de conocimiento. Este documento propone una revisión de literatura por medio del uso de la metodología ToS (Tree of Science); analizando las co-citaciones de los artículos publicados en Web of Science (WoS) entre 2001 y 2019, para así generar las tres perspectivas de mayor relevancia por medio de clústeres. Los resultados muestran que las perspectivas están orientadas a la presentación de informes RSC y la conformación de juntas directivas, efecto de la diversidad de género en las áreas de gobierno empresarial y la presentación de informes integrados y generación de valor económico. Finalmente se concluye; que es necesario implementar políticas en las corporaciones y empresas de economías emergentes, para promover una participación diversa en las juntas; así mismo es necesario continuar con la investigación, que permita analizar e identificar la evolución entre la Gobernanza y la RSC.

... Esta técnica permite a través de un análisis de co-citaciones categorizar los documentos. Finalmente, mediante minería de texto se identifican los temas que componen las perspectivas, para ello se emplea el aplicativo R, específicamente el paquete WordCloud (Ohri, 2012). Ver tabla 5. ...

Resumen El propósito de este estudio es realizar un análisis sistémico de las investigaciones sobre internacionalización de empresas Latinas. Para ello, se efectúa un análisis de red complementado con herramientas bibliométricas, logrando así, la identificación de las perspectivas del tema. Inicialmente, se efectuó la búsqueda temática en las bases de datos WoS y Scopus; las referencias obtenidas fueron procesadas mediante bibliometrix y Gephi, facilitando la clasificación de los documentos según su relevancia en tres categorías, clásicos, estructurales y recientes, finalmente se realiza un análisis de cocitaciones. Este último procedimiento permitió identificar cuatro perspectivas principales en el área: diversificación y desempeño, emprendimiento, exportaciones y mercado, estrategias y competencia. Palabras clave: Internacionalización; empresas latinas; competitividad; diversificación; economía emergente; mapeo científico. JEL: F23. Abstract The purpose of this study is to conduct a systemic analysis of research on the internationalization of Latin American companies. For this purpose, network analysis is carried out, complemented with bibliometric tools, thus achieving the identification of the perspectives on the subject. Initially, a thematic search was done in the WoS and Scopus databases; the references obtained were processed through bibliometrix and Gephi, facilitating the classification of the documents according to their relevance in three categories, classical, structural and recent, and finally, a quotation analysis is made. This last procedure allowed the identification of four main perspectives in the area: diversification and performance, entrepreneurship, exports and market, strategies, and competition.

... Las perspectivas del BC son determinadas mediante un análisis de co-citaciones, a partir de la aplicación de un algoritmo de clusterización (Blondel, Guillaume, Lambiotte & Lefebvre, 2008), procedimiento propuesto por Robledo et al. (2013), donde finalmente se identifican y analizan los temas que componen cada perspectiva utilizando minería de texto a través del paquete R, específicamente empleando el algoritmo Wordcloud (Ohri, 2012). La Figura 1 muestra la cantidad de artículos generados a partir de la consulta inicial. ...

El éxito de una empresa no solo se basa en la calidad de sus productos si no también en la forma como es recordada su marca en el mercado y a este proceso se le conoce como branding corporativo. Este tipo de estrategia ha sido reconocida como un área importante en las ciencias administrativas y necesita de una revisión detallada. Este artículo tiene como propósito mostrar la evolución del branding corporativo a través una revisión de la literatura, para ello se realiza un análisis de red apoyado en técnicas bibliométricas, además, se presentan las perspectivas de este campo de estudio en la actualidad. Para esto, se realizó una consulta en Web of Science, después se clasificaron los documentos más importantes a través de la plataforma web Tree of Science y, al final, se efectúa un análisis de co-citaciones. Este último análisis mostró que el branding corporativo presenta tres perspectivas: efecto de la marca generado por las alianzas corporativas, la influencia de la responsabilidad social y la gestión de la identidad de la marca.

... The R packages can provide solutions to practical problems in a wide variety research domains, such as social science [2], bioinformatics [3], [4], geosciences [5], business analysis [6] and in clinical sciences [7], to just list a few. Some packages focus on generic implementations of a collection of popular data analytic methods. ...

  • Ruizhu Huang Ruizhu Huang
  • Weijia Xu

The software package R is a free, powerful, open source software package with extensive statistical computing and graphics capabilities. Due to its high-level expressiveness and multitude of domain-specific packages, R has become a popular tool for data analysis in many scientific fields. While there are a number of packages enabling running R in parallel using message passing interface across multiple nodes, only few packages extend R to the new system and computing paradigm for data intensive computing, such as Hadoop and Spark. In this paper, we focus on three approaches RHadoop, RHIPE and SparkR that can scale R to distributed computing systems for solving Big Data problems. We presented an algorithm for enabling logistic regression over large set of data under MapReduce programming model. We implemented the algorithm with three packages in R to exploit the benefit of Hadoop and Spark cluster. Our implementations significantly improved the scale of the data that can be analyzed with R. We conducted a study on performance and scalability up to 1TB data with those implementations and three other common solutions for logistic regression problem. The results showed SparkR consistently outperformed other approaches and also demonstrated the advantages and limitations of each package.

... On the other hand, data analysts commonly use tools like R 7 to perform data analysis and provide decision support for managers. The basic analytical steps, given a dataset, are: data input, data processing, descriptive statistics, data visualization, model or report creation, output and presentation (Ohri, 2012). Statistical tools are currently unable to access RDF data sets to perform standard analysis tasks similarly to tables formatted datasets. ...

  • Nagore Espinosa Nagore Espinosa

Tourism is an economic sector embodying numerous industries and consequently agents and stakeholders of diverse background. Literature tends to sustain that cooperation among these stakeholders foster better performance for the destination (Timothy, 1999; Buhalis, 2000; Go and Appleman, 2000; Araujo & Bramwell, 2002). However, it has not been empirically tested. The present thesis develops a methodology (which includes studying network embeddedness via Gephi) to empirically measure cooperation among meetings industry stakeholders in city tourism destinations, and designs a model to measure tourism and meetings industry performance, at the city level, where cooperation is included as an indicator. Then cooperation's weight on performance will be studied to validate or refute its impact on performance and under which macro and microeconomic contexts. Keywords: cooperation measurement, network embeddedness, tourism performance, and meetings industry performance

  • Andrée Marie López-Fernández Andrée Marie López-Fernández

The main objective of the study is to assess current settings of business analytics around the world in relation to social responsibility. The specific objectives include to analyze the ethical ramifications and norms applicable to business analytics; and, to propose a model that puts forth standards of practice congruent with both business strategic objectives and corporate social responsibility, to ensure effective stakeholder engagement and firm performance. To do so, a series of propositions and a conceptual model to illustrate the proposed association of constructs and variables is presented, as well as a discussion on the managerial implications of such association.

El propósito de este documento es explorar la estructura intelectual del Marketing Viral, su evolución y tendencias de investigación. Para ello, a partir de técnicas y herramientas de la cienciometría se efectuó un análisis de citaciones, a partir de 188 publicaciones realizadas en el tema, desde el año 2000 hasta el 2020, en la base de datos Web of Science. Este artículo identifica tres subáreas que han emergido de la literatura. La primera enfocada a los factores que generan influencia en la toma de decisiones de los consumidores, cuando se interactúa con el contenido generado por la empresa. La segunda, explora cómo el Marketing Viral se ha visto potenciado a partir de la llegada de las redes sociales. Finalmente, la tercera, estudia el uso de influenciadores como elemento dinamizador de las campañas de este tipo. Este trabajo, explora la intersección entre la disciplina de marketing y el uso de las redes sociales, en especial, como estas últimas son un elemento para la propagación de los mensajes de las compañías y de los consumidores.

Long-term care (LTC) delivered to elderly persons in need of assistance in activities of daily living is a topic of increasing importance. The financing of LTC, the needs for specialized infrastructure and the limited number of caregivers will pose a systemic threat in many developed countries. In this paper, we analyze the factors influencing the old-age care prevalence rates in Switzerland through a log-linear regression model. Based on a cross-sectional dataset covering the LTC needs from 1995 to 2014, we statistically support the effect of key drivers such as the age, the gender and the region of residence. We distinguish the prevalence by the mild, moderate and severe frailty levels and by care received either at home or in an institution. Our regression results evidence that prevalence rates exponentially increase with the age yielding significantly higher values for women. These effects are emphasized for moderate and severe dependence and for institutional care. Finally, we forecast the number of dependent persons until 2045. Our projections reveal an important increase in the future numbers. While we observe that the dependent population more than doubles over the 30-year horizon, we report significant cantonal differences. Our results are relevant to governments, practitioners and academics alike and help to better understand the factors affecting the demand of LTC and predicting future needs.

Over the years, R has been adopted as a major data analysis and mining tool in many domain fields. As Big Data overwhelms those fields, the computational needs and workload of existing R solutions increases significantly. With recent hardware and software developments, it is possible to enable massive parallelism with existing R solutions with little to no modification. In this paper, we evaluated approaches to speed up R computations with the utilization of the Intel Math Kernel Library and automatic offloading to Intel Xeon Phi SE10P Co-processor. The testing workload includes a popular R benchmark and a practical application in health informatics. There are up to five times speedup gains from using MKL with a 16 cores without modification to the existing code for certain computing tasks. Offloading to Phi co-processor further improves the performance. The performance gains through parallelization increases as the data size increases, a promising result for adopting R for big data problem in the future.

  • Garrett Grolemund Garrett Grolemund
  • Wickham Hadley

This paper presents the lubridate package for R, which facilitates working with dates and times. Date-times create various technical problems for the data analyst. The paper highlights these problems and oers practical advice on how to solve them using lubridate. The paper also introduces a conceptual framework for arithmetic with date-times in R.

  • Markus Gesmann Markus Gesmann
  • Diego de Castillo

The googleVis package provides an interface between R and the Google Visualisation API to create interactive charts which can be embedded into web pages. The best known of these charts is probably the Motion Chart, popularised by Hans Rosling in his TED talks. With the googleVis package users can easily create web pages with interactive charts based on R data frames and display them either via the local R HTTP help server or within their own sites.

  • Hadley Wickham

A grammar of graphics is a tool that enables us to concisely describe the components of a graphic. Such a grammar allows us to move beyond named graphics (e.g., the "scat-terplot") and gain insight into the deep structure that underlies statistical graphics. This article builds on Wilkinson, Anand, and Grossman (2005), describing extensions and refinements developed while building an open source implementation of the grammar of graphics for R, ggplot2. The topics in this article include an introduction to the grammar by working through the process of creating a plot, and discussing the components that we need. The gram-mar is then presented formally and compared to Wilkinson's grammar, highlighting the hierarchy of defaults, and the implications of embedding a graphical grammar into a programming language. The power of the grammar is illustrated with a selection of examples that explore different components and their interactions, in more detail. The article concludes by discussing some perceptual issues, and thinking about how we can build on the grammar to learn how to create graphical "poems." Supplemental materials are available online.

Source: https://www.researchgate.net/publication/266955709_R_for_Business_Analytics

Posted by: claudioruyz.blogspot.com