Although the most flexible and powerful data analytics tools arguably lie within packages of coding languages such as Python and R, tools like Google Analytics and Mixpanel cannot be ignored, as they are significantly more approachable for less technical individuals. Without knowing how to code, individuals can be empowered by these tools to compare data as segments, analyze trends over time and between different features, understand correlational relationships, visualize data, and wield a platform that can be mutually utilized by nontechnical strategists and executives along with more technical data scientists and analysts, enabling teams of widely different technical skill to explore and understand data together. …
As data science tools have enabled machines to do complex analysis, predictions, and transformations off of not just extremely large text data sets, but visual and audio media, machine learning has become an emerging hot topic. What was once sci-fi, has become a reality through advanced computational capacities of machines, and massive sets of data available online.
With image processing, aural processing, and natural language processing techniques being propelled forward by machine learning and deep learning (neural network) algorithms, there’s a ton of hype around machine learning, deep learning, and “Big Data” or “Data Science” right now, riling up a lot of speculation about what machine learning will be able to do in the nearer and further future. Many in the realm of data science may disagree — often on the basis that to naysay now can pull some of the massive, multimillion to multibillion dollar funding AI projects are recently graced with — but, applications of deep learning pushed on anything and everything theoretically feasible as a business application of AI algorithms, can be massively computationally inefficient, and simply unreliable and inferior to more straightforward machine learning and human led analysis and research methods. …
From my undergraduate studies in psychology and public policy, through my experiences developing a nonprofit ecology oriented retreat center, interpreting research and it’s associated data to make informed, actionable policies, especially on several campaigns I’ve been a part of aimed at empowering people to live more fulfilling, healthy, meaningful lives, has been key to my decision making process. To me, it is important we acquire the knowledge that will enable us to live our lives in ways that genuinely satisfies and empowers us, while making room for others to do the same.
So here I am tinkering with a set of email newsletters sent to over 1,500 followers for my nonprofit group, one of my ongoing efforts to connect people to novel experiences and encourage creative collaboration, with this driving philosophy in mind. …
Organizing a bunch of unstructured data into a structured form may seem like a challenge without understanding how to automate and streamline the process. The purpose of this post is to inform Mailchimp campaign makers on how to turn the messy, disjointed csv’s Mailchimp offers freely, into a clean, concise .csv dataset with the Python Pandas library, along with Numpy, without having to install plugins or purchase any service upgrades or external software. It’s helpful if you’re familiar with Python, but even if you have No Experience with Python, so long as you’re willing to spend 30 minutes to download Python 3, Jupyter Notebooks, and the Pandas and Numpy library, ideally through the Anaconda Distribution mentioned in the Jupyter link. Doing this, then using “pip install numpy” then “pip install pandas” to connect these libraries, should enable you to follow along with my project’s code hosted Here, on my Github repository. …
As part of my immersive data science course at General Assembly, I designed a classification model in Python, using natural language processing, and basic machine learning techniques. This model would determine the origin of a reddit post- if it was from the /r/futurology, or /r/worldnews subreddit, though the model can be generalized to compare other subreddits. The model succeeded, typically determining which post belonged where about 83 to 91% of the time. …
My intrigue in modeling 4 variables using a 3 vector XYZ graph, using a single vector sliding scale controlling a 4th variable (Just like in the video shared above!), began while I prepared to enroll in my Immersive Data Science course. I was inspired while reading up on Python, and Calculus, (a class I flunked out of in my final semester of High School, deciding that the incredibly short-sighted premise that “you won’t always have a graphing calculator in your pocket to do this!!” as a rationale to force students to calculate tedious functions, no longer appealed to me.) …
About