#machine learning

LIVE
Giving AI a lesson in humanityAI is functional, but can it be friendly? Soul Machines believes so. T

Giving AI a lesson in humanity

AI is functional, but can it be friendly? Soul Machines believes so. They’re a company working to enhance the way we interact with systems like virtual tutors and digital assistants that are powered by AI. Soul Machines taps the talents of AI researchers, neuroscientists, psychologists, artists, and other innovative thinkers to help create personable “digital humans” that run on the backbone of IBM Watson—specifically, Watson Assistant, which powers the AI’s dialog interface. By creating approachable AI-powered helpers, Soul Machines is helping ensure you’ll walk away from machine interactions with a smile on your face.

Learn how Watson helps power approachable AI ->


Post link
 Minimizing laser phase noise with machine learningUltra-precise lasers can be used for optical atom

Minimizing laser phase noise with machine learning

Ultra-precise lasers can be used for optical atomic clocks, quantum computers, power cable monitoring, and much more. But all lasers make noise, which researchers from DTU Fotonik want to minimize using machine learning.

The perfect laser does not exist. There will always be a bit of phase noise because the laser light frequency moves back and forth a little. Phase noise prevents the laser from producing light waves with the perfect steadiness that is otherwise a characteristic feature of the laser.

Most of the lasers we use on a daily basis do not need to be completely precise. For example, it is of no importance whether the frequency of the red laser light in the supermarket barcode scanners varies slightly when reading the barcodes. But for certain applications—for example in optical atomic clocks and optical measuring instruments—it is absolutely crucial that the laser is stable so that the light frequency does not vary.

One way of getting closer to an ultra-precise laser is if you can determine the phase noise. This may enable you to find a way of compensating for it, so that the result becomes a purer and more accurate laser beam.

Read more.


Post link
 Thanks to machine learning, the future of catalyst research is now To date, research in the field o

Thanks to machine learning, the future of catalyst research is now

To date, research in the field of combinatorial catalysts has relied on serendipitous discoveries of catalyst combinations. Now, scientists from Japan have streamlined a protocol that combines random sampling, high-throughput experimentation, and data science to identify synergistic combinations of catalysts. With this breakthrough, the researchers hope to remove the limits placed on research by relying on chance discoveries and have their new protocol used more often in catalyst informatics.

Catalysts, or their combinations, are compounds that significantly lower the energy required to drive chemical reactions to completion. In the field of combinatorial catalyst design, the requirement of synergy—where one component of a catalyst complements another—and the elimination of ineffective or detrimental combinations are key considerations. However, so far, combinatorial catalysts have been designed using biased data or trial-and-error, or serendipitous discoveries of combinations that worked. A group of researchers from Japan has now sought to change this trend by trying to devise a repeatable protocol that relied on a screening instrument and software-based analysis.

Read more.


Post link
Google’s new browser experiment lets you learn about basic AIJust how does machine learning wo

Google’s new browser experiment lets you learn about basic AI

  • Just how does machine learning work? You’ve probably read a primer or two on the subject, but often the best way to understand a thing is to try it out for yourself. With that in mind, check out this little in-browser experiment from Google named Teachable Machine. It’s a perfect two-minute summary of what a lot of modern AI can — and more importantly can’t— do.
  • Teachable Machine lets you use your webcam to train an extremely basic AI program. Just hit the “train green/purple/orange” buttons, and the machine will record whatever it can see through your webcam. 

  • Once it’s “learned” enough, it’ll output whatever you like (a GIF or a sound effect or some speech) when it sees the object or activity you trained it with. Read More

Post link

afutureworththinkingabout:

I’m Not Afraid of AI Overlords— I’m Afraid of Whoever’s Training Them To Think That Way

by Damien P. Williams

I want to let you in on a secret: According to Silicon Valley’s AI’s, I’m not human.

Well, maybe they think I’m human, but they don’t think I’m me. Or, if they think I’m me and that I’m human, they think I don’t deserve expensive medical care. Or that I pose a higher risk of criminal recidivism. Or that my fidgeting behaviours or culturally-perpetuated shame about my living situation or my race mean I’m more likely to be cheating on a test. Or that I want to see morally repugnant posts that my friends have commented on to call morally repugnant. Or that I shouldn’t be given a home loan or a job interview or the benefits I need to stay alive.

Now, to be clear, “AI” is a misnomer, for several reasons, but we don’t have time, here, to really dig into all the thorny discussion of values and beliefs about what it means to think, or to be a mind— especially because we need to take our time talking about why values and beliefs matter to conversations about “AI,” at all. So instead of “AI,” let’s talk specifically about algorithms, and machine learning.

Machine Learning (ML) is the name for a set of techniques for systematically reinforcing patterns, expectations, and desired outcomes in various computer systems. These techniques allow those systems to make sought after predictions based on the datasets they’re trained on. ML systems learn the patterns in these datasets and then extrapolate them to model a range of statistical likelihoods of future outcomes.

Algorithms are sets of instructions which, when run, perform functions such as searching, matching, sorting, and feeding the outputs of any of those processes back in on themselves, so that a system can learn from and refine itself. This feedback loop is what allows algorithmic machine learning systems to provide carefully curated search responses or newsfeed arrangements or facial recognition results to consumers like me and you and your friends and family and the police and the military. And while there are many different types of algorithms which can be used for the above purposes, they all remain sets of encoded instructions to perform a function.

And so, in these systems’ defense, it’s no surprise that they think the way they do: That’s exactly how we’ve told them to think.

[Image of Michael Emerson as Harold Finch, in season 2, episode 1 of the show Person of Interest, “The Contingency.” His face is framed by a box of dashed yellow lines, the words “Admin” to the top right, and “Day 1” in the lower right corner.]


Read the rest of I’m Not Afraid of AI Overlords— I’m Afraid of Whoever’s Training Them To Think That WayatA Future Worth Thinking About

Worldie Helps Women and Girls With Technology! Our new logo update makes that even clearer with the dark hot pink streak centered!

It’s like Lynda Carter’s Wonder Woman in the 1970s with the ratio today of nearly no women leading or controlling technology. It took us backwards and hurts female public figures everywhere, mostly from bot farms, but Worldie and our partners supporting women will push it FORWARD

Many women in artificial intelligence and machine learning coding struggle from being removed from crowdsourcing websites, getting fewer reviews, and less hiring, until we came along! We will cause the change!

Wonder Woman uses AI Robot Rover to save the plan…et!

Worldie Helps Women and Girls With Technology! Our new logo update makes that even clearer

We have a dark hot pink streak in the center, yet, it’s still all-in-one.
You could create an entire Tech company just on helping women with Tech. 

We hope the internet changes!! Not only that -> we’ll make it change!

It’s like the Lynda Carter’s Wonder Woman in 1970′s with the ratios. There are nearly no female founders and very few artificial intelligence/machine learning female coders - and those who are, struggle to compete by being removed from crowdsourcing sites, harsher reviews, and not getting hired - until we came along!
Wonder Woman uses the AI Robot Rover to save the plan…et! 

 Pix2Pix Edges2Pichaku ExampleNew interactive Pix2Pix visual translation example from ml5js transfor Pix2Pix Edges2Pichaku ExampleNew interactive Pix2Pix visual translation example from ml5js transfor

Pix2Pix Edges2Pichaku Example

New interactive Pix2Pix visual translation example from ml5js transforms your doodles based on a dataset trained on a Pokemon character.

1. Press your mouse to draw a Pikachu on the canvas.

2. Click ‘Transfer’ button.

3. A colored Pikachu image will appear in ~5s.

4. Click 'Clear’ button to clear the canvas and draw again.

You can try it out for yourself here

ml5js is an online resource to aid learning of implementing Machine Learning methods using javascript for web-based projects - you can discover more here


Post link
 #generatethebibleArt project from Shirin Anlen and Ziv Schneider combines Machine Learning tech and #generatethebibleArt project from Shirin Anlen and Ziv Schneider combines Machine Learning tech and #generatethebibleArt project from Shirin Anlen and Ziv Schneider combines Machine Learning tech and #generatethebibleArt project from Shirin Anlen and Ziv Schneider combines Machine Learning tech and #generatethebibleArt project from Shirin Anlen and Ziv Schneider combines Machine Learning tech and #generatethebibleArt project from Shirin Anlen and Ziv Schneider combines Machine Learning tech and #generatethebibleArt project from Shirin Anlen and Ziv Schneider combines Machine Learning tech and

#generatethebible

Art project from Shirin AnlenandZiv Schneider combines Machine Learning tech and Religion, generating visual compositions using a Text to Image framework using quotes from the Bible. The project used COCO dataset (with code by Eyal Gruss).

If you click on any of the images above, you should see the quote each image is refering to, and you can follow the hashtag on Instagram here


Post link
OctiiOS video app employs Computer Vision and Machine Learning to recognize full body forms to applyOctiiOS video app employs Computer Vision and Machine Learning to recognize full body forms to applyOctiiOS video app employs Computer Vision and Machine Learning to recognize full body forms to applyOctiiOS video app employs Computer Vision and Machine Learning to recognize full body forms to applyOctiiOS video app employs Computer Vision and Machine Learning to recognize full body forms to apply

Octi

iOS video app employs Computer Vision and Machine Learning to recognize full body forms to apply various effects, ranging from chromakey filters to gesture actions, and additional 3D characters that can imitate your movements:

Octi is an AI video platform that sees and understands humans in your videos, allowing you to do new, exciting things with your phone camera.

*Create custom, interactive stickers of you and your friends. Insert them in cool videos to share.
*Change your body with stunning visual effects. Turn your body into diamonds!
*Learn unique body moves that will trigger instant effects in your videos. Say hello with a rainbow or make it rain with dollars.  
*Collaborate on shared augmented videos with your connections- create the ultimate expression with video.
*With Octi Dittos you have your own mini clone that will copy your body moves exactly!
*Octi is at the beginning of an Augmented Video revolution- more coming soon! 

More Here

EDIT:

image

One of the developers of this app, Sam Loeschen, shares potential graphical effects that could be added in the future [source]


Post link
MachineTubeProject by Redditor sinofis is a browserbased interface to make your own deepfakes. It feMachineTubeProject by Redditor sinofis is a browserbased interface to make your own deepfakes. It fe

MachineTube

Project by Redditor sinofis is a browserbased interface to make your own deepfakes. It features a small selection of trained models / personalities which you can use for images or very short videos. The video below demonstrates how to create a video (which visualizes the process of how they are put together):

More Here


Post link
 Making Amazon Alexa respond to Sign Language using AILatest project from Abhishek Singh utilizes Ma Making Amazon Alexa respond to Sign Language using AILatest project from Abhishek Singh utilizes Ma

Making Amazon Alexa respond to Sign Language using AI

Latest project from Abhishek Singh utilizes Machine Learning and Computer Vision to enable Sign Language as an input to Smart Speaker technology:

If voice is the future of computing what about those who cannot speak or hear? I used deep learning with TensorFlow.js to make Amazon Echo respond to sign language. 

Link


Post link

aiweirdness:

Sea Shanty Surrealism

I’ve been working with an image-generating algorithm by Vadim Epstein called CLIP+FFT, which uses OpenAI’s CLIP algorithm to judge whether images match a given caption, and an FFT algorithm to come up with new images to present to CLIP. Give it any random phrase, and CLIP+FFT will try its best to come up with a matching image. And now there’s a version that will generate images to go with several phrases in a row and then fuse them into a video.

Here’s the sea shanty The Wellerman, sung by Nathan Evans, Jonny Stewart, and others, and illustrated by CLIP+FFT.

Now, there are several interesting things going on here, once you get past the sheer AI fever dream horror of it. One thing you’ll notice is that I changed some of the lines from the standard lyrics. CLIP+FFT deals with each line independently, so even if we have been talking about a ship and a whale throughout the song, the AI doesn’t know that in “when down on her a right whale bore”, the “her” refers to a ship. I made similar tweaks in one or two places.

There was nothing I could do about the line “One day, when the tonguing is done”. Trying to be more precise about the whaling sense of “tonguing” would, if anything, have made the image more horrifying.

Having none of the “Wellerman is a ship” context, the AI interprets The Wellerman itself as some kind of eldritch oil well drilling supervillain.

I kind of like what happened to “The winds blew hard, her bow dipped down,” with golden locks of hair and bows everywhere. I mean, I like it in a “oh no this has gone terribly yet fascinatingly wrong” sort of way.

The image for “We’ll take our leave and go” is also interesting, since it illustrates “leave” in so many ways. Sometimes there are cars and suitcases, or people shaking hands. Interestingly, I see hints of European Union flags and British flags in many of them, signs that during training CLIP was learning to associate “leave” with Brexit.

The “bully boys” are hilarious, classic glowering expressions and mean-kid haircuts. The AI is not used to the early-1900s meaning of “bully = awesome”

You’ll notice that many of the frames have text, which I find charming, as if the AI is frowning to itself and muttering “tea. tea. Billy. tea.” or “blow. blow.” The less interpretable the phrase is in image form, the more likely the AI is to use text instead.

In fact CLIP treating the word and the object as equivalent has led to an interesting way of fooling its image recognition capabilities:

I also had CLIP+FFT illustrate The Twelve Days of Christmas and this is one of my favorite frames from it: Ten Lords A-Leaping

To see the other illustrated Days of Christmas (including the weirdly human-faced swans), become a supporter of AI Weirdness! Or become a free subscriber to get new AI Weirdness posts in your inbox.

Every visual iteration of “the tonguing” is deeply unsettling. I love it.

reachartwork:

“assorted still lives of nothing in particular”

in order, “nothing”, “a beach”, “the moon”, “an abandoned building”, “a bowl of fruit and a mirror”

reachartwork:

“assorted still lives of nothing in particular” pt 2

in order, “a fruit and some bones”, “all quiet on the western front”, “a bouquet of colorful flowers”, “a bowl of fruit that looks like a mountain range”, “a concrete megastructure”, “an acropolis landscape”

Starts With A Bang Podcast #69 - Machine Learning In Astronomy

When you think about how astronomy works, you probably think about observers pointing telescopes at objects, collecting data about their properties, and then analyzing that data to determine what those objects are truly like, and to infer what they can teach or show us about the Universe. But that’s a rather old-fashioned way of doing things: one that’s contingent on there being enough astronomers to examine all of that data manually. What do we do in this new era of big data in astronomy, where there aren’t enough astronomers on Earth to even look at all of the data by hand?

The way we deal with it is fascinating, and involves a mix of statistics, classical analysis and categorization, and novel techniques like machine learning and simulating mock catalogues to “train” an artificial intelligence. Perhaps the most exciting aspect is how thoroughly the best of these applications continuously outperform, in both quality and speed, any of the manual techniques we’ve used previously. Here to walk us through this exciting and emerging field of machine learning in astronomy is Sankalp Gilda, PhD candidate and astronomer from the University of Florida.

We’ve got a great 90 minutes here for you, so buckle up and enjoy the ride!

ffanalytics:

Introduction

The goal of this research is to find further evidence for the benefits of distributed mentoring. Distributed mentoring is “a kind of mentoring that is uniquely suited to networked communities, where people of all ages and experience levels engage with and support one another through a complex, interwoven tapestry of interactive, cumulatively sophisticated advice and informal instruction” [1]. This involves multiple kinds of feedback exchanged between many mentors and mentees. In this research project, we used machine learning to classify Fanfiction.net reviews by their category within distributed mentoring theory.

Earlier research in our group published in the paper ‘More Than Peer Production: Fanfiction Communities as Sites of Distributed Mentoring’ has outlined 13 categories that were observed in Fanfiction.net reviews [2]. We used shallow positive, targeted positive, targeted constructive, and targeted positive & constructive for this analysis, as they are the four mutually exclusive codes. Table 1 below provides a formal description and percentage of reviews for each of the categories [2].

Table 1: Description and Percentage of Categories (based on 4500 reviews)

(Note: percentages add up to more than 100% because a review could be in multiple categories).

An example of a shallow positive review is “Great story!”, targeted positive is “I loved the character development of James”, and a targeted constructive review is “You could have described the battle scene better!” Targeted positive & constructive reviews contains both targeted positive and targeted constructive comments.

Our overarching research question is “Do certain review categories correlate with various attributes of distributed mentoring?” For example, we want to explore whether substantive, targeted reviews improve authors’ writing. This research would be beneficial to the fanfiction community, as it would provide an outline to members of the community on how to effectively impact and interact with authors. The theory of distributed mentoring is an applicable framework to use, as it discusses the effect of networked communities. To apply this theory, we used the public reviews available in the fanfiction community. Since there are numerous types of reviews, we used the codes listed in Table 1 to classify the reviews.

To classify all Fanfiction.net reviews, roughly 177 million, we explored machine learning classification, as manual coding would be impossible. Classification is a process of predicting the review category for a given set of reviews.

Our goal for this blog was to find the best machine learning model for review classification. We could then use this model to expand our results to the entire Fanfiction.net reviews dataset. Our baseline classification tool was ALOE (Affect Labeler of Expressions), an open source tool developed to train and test machine learning classifiers to automatically label chat messages with different emotion or affect categories [3]. In addition, we attempted various algorithms such as logistic regression, support vector machines, and Naive Bayes. This blog post discusses our approach to running ALOE as well as creating each of the aforementioned machine learning models.


Dataset

To conduct machine classification, we required data to train the model to learn how reviews relate to a certain category. We leveraged a dataset manually classified by previous participants in the UW Human-Centered Data Science Lab research group. Our dataset contained ~8000 manually classified reviews.


Method

The measures of success for performance were accuracy, precision, and recall. Accuracy is the number of correct predictions. This measure, however, can be misleading in classification problems. In the field of data science, we call a positive value true and a negative value false. In this case the value is positive if the review corresponds to the category in question and false otherwise. For example, if a dataset has 99 positive data points and 1 negative data points, a model that predicts only positive would receive a 0.99 accuracy. Therefore, we also used precision and recall to provide a holistic perspective. Precision is ‘how many negative data points did I include in my list of positively predicted examples’, and recall is ‘how many positive data points did I miss’. An average range for precision and recall is 0.6 - 0.7. Anything below 0.6 may signify that the results are not valid while and anything above 0.7 is generally considered a really good score that validates our accuracy.


Figure 1: Image from Wikipedia visually describing Precision and Recall





1. ALOE

We were able to run ALOE by following the documentation at https://github.com/etcgroup/aloe.

2. Other Classifiers

2.1 Logistic Regression

Logistic Regression is a method commonly used when the inputs of the model are categories. We experimented with multiple different parameters and sought a set of parameters that yield the best result from the model.

2.2 Naive Bayes

Naive Bayes is a family of machine learning based on applying Bayes’ theorem to calculate certain probabilities. We explored 3 types of Naive Bayes classifiers on the four categories of data. These were the Gaussian, Bernoulli and Multinomial Naive Bayes methods.

2.3 Support Vector Machine (SVM)

SVM is a method to find the best division between two classes. We explored three different SVM models: default, linear, and  optimal. We used a technique to find the best parameters for each of these models.

When using the four categories defined above, we received low precision and recall scores for targeted constructive and targeted positive & constructive. Hence we decided to combine the three targeted categories in order to solidify our results. This is because there are very few reviews in the dataset for the latter two categories, and all targeted categories qualify as “substantive” since they provide specific feedback to authors. Consequently, we decided to add the update encouragement category, as 27.6% of our dataset is classified as this code. Update encouragement is a category that represents all reviews that encourage the author to write more [2]. These changes enable a more accurate comparison between the various models.


Results

After these changes, we got the following results for our models on shallow positive, targeted, and update encouragement. All values are representative of percentages, from a scale from 0 to 1.





Conclusion

We will expand these results by classifying the entire Fanfiction.net dataset, 177 million reviews, by using Optimal SVM to predict shallow positive and update encouragement reviews and ALOE to predict targeted reviews. After which, we plan to proceed with our analysis between these review categories and attributes of distributed mentoring such as improvement of writing and participation rate. As a starting point, we will explore whether targeted reviews impact authors’ lexical diversity - which is an indicator of improvement in the authors’ writing and a learning gain from online informal learning. Additionally, we will brainstorm other metrics to measure learning and distributed mentoring. Overall, we are delighted that our changes gave positive results and were able to create models that performed better than our baseline, ALOE. A better model means we can more accurately classify reviews and expand our results to provide a blueprint to the fanfiction community on how to effectively impact and interact with authors.

Citations

  1. Aragon C. Human-Centered Data Science Lab » Distributed Mentoring in Fanfiction Communities Human-Centered Data Science Lab. Depts.washington.edu. https://depts.washington.edu/hdsl/research/distributed-mentoring/. Published 2019. Accessed June 5, 2019.
  2. Evans, S., Davis, K., Evans, A., Campbell, J. A., Randall, D. P., Yin, K., & Aragon, C. (2017, February). More than peer production: fanfiction communities as sites of distributed mentoring. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing(pp. 259-272). ACM.
  3. Brooks M. etcgroup/aloe. GitHub. https://github.com/etcgroup/aloe.

Hello fanfic friends! In the past few months, our research group has done several research projects about fanfiction communities. We have written our research results into a collection of blog posts. Starting this week, we are going to post once a week to share our findings.  Stay turned for more posts!!!

Introduction

The goal of this research is to find further evidence for the benefits of distributed mentoring. Distributed mentoring is “a kind of mentoring that is uniquely suited to networked communities, where people of all ages and experience levels engage with and support one another through a complex, interwoven tapestry of interactive, cumulatively sophisticated advice and informal instruction” [1]. This involves multiple kinds of feedback exchanged between many mentors and mentees. In this research project, we used machine learning to classify Fanfiction.net reviews by their category within distributed mentoring theory.

Earlier research in our group published in the paper ‘More Than Peer Production: Fanfiction Communities as Sites of Distributed Mentoring’ has outlined 13 categories that were observed in Fanfiction.net reviews [2]. We used shallow positive, targeted positive, targeted constructive, and targeted positive & constructive for this analysis, as they are the four mutually exclusive codes. Table 1 below provides a formal description and percentage of reviews for each of the categories [2].

Table 1: Description and Percentage of Categories (based on 4500 reviews)

(Note: percentages add up to more than 100% because a review could be in multiple categories).

An example of a shallow positive review is “Great story!”, targeted positive is “I loved the character development of James”, and a targeted constructive review is “You could have described the battle scene better!” Targeted positive & constructive reviews contains both targeted positive and targeted constructive comments.

Our overarching research question is “Do certain review categories correlate with various attributes of distributed mentoring?” For example, we want to explore whether substantive, targeted reviews improve authors’ writing. This research would be beneficial to the fanfiction community, as it would provide an outline to members of the community on how to effectively impact and interact with authors. The theory of distributed mentoring is an applicable framework to use, as it discusses the effect of networked communities. To apply this theory, we used the public reviews available in the fanfiction community. Since there are numerous types of reviews, we used the codes listed in Table 1 to classify the reviews.

To classify all Fanfiction.net reviews, roughly 177 million, we explored machine learning classification, as manual coding would be impossible. Classification is a process of predicting the review category for a given set of reviews.

Our goal for this blog was to find the best machine learning model for review classification. We could then use this model to expand our results to the entire Fanfiction.net reviews dataset. Our baseline classification tool was ALOE (Affect Labeler of Expressions), an open source tool developed to train and test machine learning classifiers to automatically label chat messages with different emotion or affect categories [3]. In addition, we attempted various algorithms such as logistic regression, support vector machines, and Naive Bayes. This blog post discusses our approach to running ALOE as well as creating each of the aforementioned machine learning models.


Dataset

To conduct machine classification, we required data to train the model to learn how reviews relate to a certain category. We leveraged a dataset manually classified by previous participants in the UW Human-Centered Data Science Lab research group. Our dataset contained ~8000 manually classified reviews.


Method

The measures of success for performance were accuracy, precision, and recall. Accuracy is the number of correct predictions. This measure, however, can be misleading in classification problems. In the field of data science, we call a positive value true and a negative value false. In this case the value is positive if the review corresponds to the category in question and false otherwise. For example, if a dataset has 99 positive data points and 1 negative data points, a model that predicts only positive would receive a 0.99 accuracy. Therefore, we also used precision and recall to provide a holistic perspective. Precision is ‘how many negative data points did I include in my list of positively predicted examples’, and recall is ‘how many positive data points did I miss’. An average range for precision and recall is 0.6 - 0.7. Anything below 0.6 may signify that the results are not valid while and anything above 0.7 is generally considered a really good score that validates our accuracy.


Figure 1: Image from Wikipedia visually describing Precision and Recall





1. ALOE

We were able to run ALOE by following the documentation at https://github.com/etcgroup/aloe.

2. Other Classifiers

2.1 Logistic Regression

Logistic Regression is a method commonly used when the inputs of the model are categories. We experimented with multiple different parameters and sought a set of parameters that yield the best result from the model.

2.2 Naive Bayes

Naive Bayes is a family of machine learning based on applying Bayes’ theorem to calculate certain probabilities. We explored 3 types of Naive Bayes classifiers on the four categories of data. These were the Gaussian, Bernoulli and Multinomial Naive Bayes methods.

2.3 Support Vector Machine (SVM)

SVM is a method to find the best division between two classes. We explored three different SVM models: default, linear, and  optimal. We used a technique to find the best parameters for each of these models.

When using the four categories defined above, we received low precision and recall scores for targeted constructive and targeted positive & constructive. Hence we decided to combine the three targeted categories in order to solidify our results. This is because there are very few reviews in the dataset for the latter two categories, and all targeted categories qualify as “substantive” since they provide specific feedback to authors. Consequently, we decided to add the update encouragement category, as 27.6% of our dataset is classified as this code. Update encouragement is a category that represents all reviews that encourage the author to write more [2]. These changes enable a more accurate comparison between the various models.


Results

After these changes, we got the following results for our models on shallow positive, targeted, and update encouragement. All values are representative of percentages, from a scale from 0 to 1.





Conclusion

We will expand these results by classifying the entire Fanfiction.net dataset, 177 million reviews, by using Optimal SVM to predict shallow positive and update encouragement reviews and ALOE to predict targeted reviews. After which, we plan to proceed with our analysis between these review categories and attributes of distributed mentoring such as improvement of writing and participation rate. As a starting point, we will explore whether targeted reviews impact authors’ lexical diversity - which is an indicator of improvement in the authors’ writing and a learning gain from online informal learning. Additionally, we will brainstorm other metrics to measure learning and distributed mentoring. Overall, we are delighted that our changes gave positive results and were able to create models that performed better than our baseline, ALOE. A better model means we can more accurately classify reviews and expand our results to provide a blueprint to the fanfiction community on how to effectively impact and interact with authors.

Citations

  1. Aragon C. Human-Centered Data Science Lab » Distributed Mentoring in Fanfiction Communities Human-Centered Data Science Lab. Depts.washington.edu. https://depts.washington.edu/hdsl/research/distributed-mentoring/. Published 2019. Accessed June 5, 2019.
  2. Evans, S., Davis, K., Evans, A., Campbell, J. A., Randall, D. P., Yin, K., & Aragon, C. (2017, February). More than peer production: fanfiction communities as sites of distributed mentoring. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing(pp. 259-272). ACM.
  3. Brooks M. etcgroup/aloe. GitHub. https://github.com/etcgroup/aloe.

Water Color Cowboys

Made with multiple passes of the same media through EBsynth.
Musical accompaniment made by feeding Western themed music to midiMe and arranging the results in Kaosolator.

What’s the yams?


The First-Person Industrial Complex by Laura Bennett
The Internet prizes the harrowing personal essay. But sometimes telling your story comes with a price.

This is a key problem with the new first-person economy: the way it incentivizes knee-jerk, ideally topical self-exposure, the hot take’s more intimate sibling. The mandate at xoJane, according to Carroll, was: the more “shameless” an essay, the better. Carroll describes how “internally crushing” it became to watch her inbox get flooded every day with the darkest moments in strangers’ lives: “eating disorders, sexual assault, harassment, ‘My boyfriend’s a racist and I just realized it.’ ” After a while, Carroll said, the pitches began to sound as if they were all written in the same voice: “immature, sort of boastful.” Tolentino, who worked as an editor at the HairpinbeforeJezebel, characterizes the typical Jezebel pitch as the “microaggression personal essay” or “My bikini waxer looked at me funny and here’s what it says about women’s shame,” and the typical Hairpin pitch as “I just moved to the big city and had a beautiful coffee shop encounter, and here’s what it says about urban life.”

It’s harder than ever to weigh the ethics of publishing these pieces against the market forces that demand them, especially as new traffic analytics make it easy to write and edit with metrics in mind. “I’ve always loved unvarnished, almost performative, extemporaneous bloggy writing,” Gould says. “But now an editor will be like, can you take this trending topic and make it be about you?” Sarah Hepola, who edits Salon’s personal essays, says that the question “What am I doing to these writers?” is always in the back of her mind: “I try to warn them that their Internet trail will be ‘I was a BDSM person,’ and they did it for $150.” But editors’ best efforts aside, this is, more than anything, a labor problem—writers toiling at the whims of a system with hazardous working conditions that involve being paid next to nothing and guaranteed a lifetime of SEO infamy. The first-person boom, Tolentino says, has helped create “a situation in which writers feel like the best thing they have to offer is the worst thing that ever happened to them.”

I really enjoyed reading the reddit comments on this piece. /u/walker6168 criticized the article, writing, “I felt like the author didn’t want to go the extra mile. It doesn’t quite condemn the practice for being exploitative and taking advantage of people who have had terrible experiences. It doesn’t address the huge risk that comes with the format: verifying the story, like with the Rolling Stones UVA article. Nor does it really engage with the format’s desire to distort every tragedy into a politically correct format.” /u/smeethu countered, “I agree that the author didn’t go all the way and condemn the practice, but she still went into enough depth to make me explore its nuances. What I find is that these people are being exploited, but they are also exploiting themselves. If you are a starving freelance writer who is behind on rent, you know you need to get paid. Writing a shocking personal essay is one way to guarantee that. And it sells for the same reason people tune in to reality TV: we enjoy exploring the dark parts of our lives and it’s entertaining.” I feel like that argument is also used for other exploitive practices, like factories and sweatshops (i.e. the people who work there are happy to have found work at all). I think the way our society is structured encourages exploitation through commodification. We’re commodifying people’s experiences and are meant to feel okay about it because they’re supposed to speak to some universally relatable theme.

On a similar note, /u/DevFRus wrote, “At what point do such first-person essays stop being empowering and become a circus side-show? It seems to me like it is becoming less and less about giving people who had no voice before a voice, and more and more about exploiting those people for clicks. I wish the author engaged more critically with these aspects of the industry.” I think the question of when things stop being empowering is really important. It may feel empowering for someone to bare their heart in the moment, but does that mean true consent when the underlying system is exploitive? It may feel empowering for a woman to dress in provocative clothing, but is that truly making a statement in a culture steeped in compulsory sexuality and the sexual objectification of female bodies? When does the individual need to step back and consider the system rather than individual empowerment?


How big data is unfair by Moritz Hardt
Understanding sources of unfairness in data driven decision making

As we’re on the cusp of using machine learning for rendering basically all kinds of consequential decisions about human beings in domains such as education, employment, advertising, health care and policing, it is important to understand why machine learning is not, by default, fair or just in any meaningful way.

This runs counter to the widespread misbelief that algorithmic decisions tend to be fair, because, y’know, math is about equations and not skin color. […] I’d like to refute the claim that “machine learning is fair by default”. I don’t mean to suggest that machine learning is inevitably unfair, but rather that there are powerful forces that can render decision making that depends on learning algorithms unfair.

[…] a learning algorithm is designed to pick up statistical patterns in training data. If the training data reflect existing social biases against a minority, the algorithm is likely to incorporate these biases. This can lead to less advantageous decisions for members of these minority groups. Some might object that the classifier couldn’t possibly be biased if nothing in the feature space speaks of the protected attributed, e.g., race. This argument is invalid. After all, the whole appeal of machine learning is that we can infer absent attributes from those that are present. Race and gender, for example, are typically redundantly encoded in any sufficiently rich feature space whether they are explicitly present or not. They are latent in the observed attributes and nothing prevents the learning algorithm from discovering these encodings. In fact, when the protected attribute is correlated with a particular classification outcome, this is precisely what we should expect. There is no principled way to tell at which point such a correlation is worrisome and in what cases it is acceptable.

My knee-jerk reaction when reading the article title was, “What? How can an algorithm be unfair?” It’s interesting to have forgotten about the inherent biases in the data itself.


Verge Fiction: The Date by Emily Yoshida
The kid couldn’t have been older than 24, but there was a deep, distant fatigue to his face, and dark shadows lined his eyes. As he stared down at the tablet his face went slack, as if momentarily hypnotized by its glow. He took a sip of Red Bull Yellow Edition and handed the tablet back to me, this time with a new document labeled STUDY OUTLINE.

“So if you read through that, you’ll get the basic gist of it,” he said matter-of-factly. “Basically, you’re going to be contacted by a number of brands over the duration of the test period, and you’re to react as you normally would; you’re free to ignore them, or take advantage of whatever offers or promotions they have going on. Totally up to you. These may show up on email, Facebook, any social network you’ve provided us with — and as you’ll see in the release form in a second, you do get compensated more for every account you sign over to us. At the end of the study you’ll be asked to report how many brands contacted you, and we’ll check it against our own records. There is also a possibility that you will be a placebo subject — that no brands will contact you.”

[…] By the time I walked out the door I had had enough Pinot Grigio in me to feel sufficiently light on my feet about this whole adventure. All right, this is what you are doing now, I kept repeating in my head. You are in the world and you are letting yourself be changed by it, and that is normal and fun. The Jam Cellar was walking distance to my apartment, and as I made my way down there I listened to a playlist I had made for myself on Apple Music on my new fancy wireless headphones.

Every fifth step I felt my heart wobble a little as I remembered the picture of Marcus and that corgi. He had two other photos that I had stared at in between our chats — one of him sitting at a brunch spot drinking some kind of complicated looking cocktail out of a hammered copper mug, the other of him at the beach during sunset, in silhouette from behind as he ran toward the water. You couldn’t even see his face. He was willing to use a whole picture slot for something that didn’t even show his face. I liked that.

A terrifying, if a bit hokey, glimpse at the role of brands in our lives.

randomslasher:omghotmemes:See you in 1977 It is literally impossible to pick a favorite line from th

randomslasher:

omghotmemes:

See you in 1977

It is literally impossible to pick a favorite line from this because the entire thing is gold. 


Post link

Kristian Aune, Tech Product Manager, Verizon Media

In the previous update, we mentioned Improved Slow Node Tolerance, Multi-Threaded Rank Profile Compilation, Reduced Peak Memory at Startup, Feed Performance Improvements, and Increased Tensor Performance. This month, we’re excited to share the following updates:

Support for Approximate Nearest Neighbor Vector Search 

Vespa now supports approximate nearest neighbor search which can be combined with filters and text search. By using a native implementation of the HNSW algorithm, Vespa provides state of the art performance on vector search: Typical single digit millisecond response time, searching hundreds of millions of documents per node, but also uniquely allows vector query operators to be combined efficiently with filters and text search - which is usually a requirement for real-world applications such as text search and recommendation. Vectors can be updated in real-time with a sustained write rate of a few thousand vectors per node per second. Read more in the documentation on nearest neighbor search

Streaming Search Speedup

Streaming Search is a feature unique to Vespa. It is optimized for use cases like personal search and e-mail search - but is also useful in high-write applications querying a fraction of the total data set. With #13508, read throughput from storage increased up to 5x due to better parallelism.

Rank Features

  • The (Native)fieldMatch rank features are optimized to use less CPU query time, improving query latency for Text Matching and Ranking
  • The new globalSequence rank feature is an inexpensive global ordering of documents in a system with stable system state. For a system where node indexes change, this is inaccurate. See globalSequence documentation for alternatives.

GKE Sample Application

Thank you to Thomas Griseau for contributing a new sample application for Vespa on GKE, which is a great way to start using Vespa on Kubernetes.

About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

We welcome your contributions and feedback (tweetoremail) about any of these new features or future improvements you’d like to request.

Kristian Aune, Tech Product Manager, Verizon Media

In the April updates, we mentioned Improved Performance for Large Fan-out Applications, Improved Node Auto-fail Handling, CloudWatch Metric Import and CentOS 7 Dev Environment. This month, we’re excited to share the following updates:

Improved Slow Node Tolerance

To improve query scaling, applications can group content nodes to balance static and dynamic query cost. The largest Vespa applications use a few hundred nodes. This is a great feature to optimize cost vs performance in high-query applications. Since Vespa-7.225.71, the adaptive dispatch policy is made default. This balances load to the node groups based on latency rather than just round robin - a slower node will get less load and overall latency is lower.

Multi-Threaded Rank Profile Compilation

Queries are using a rank profile to score documents. Rank profiles can be huge, like machine learned models. The models are compiled and validated when deployed to Vespa. Since Vespa-7.225.71, the compilation is multi-threaded, cutting compile time to 10% for large models. This makes content node startup quicker, which is important for rolling upgrades.

Reduced Peak Memory at Startup

Attributes is a unique Vespa feature used for high feed performance for low-latency applications. It enables writing directly to memory for immediate serving. At restart, these structures are reloaded. Since Vespa-7.225.71, the largest attribute is loaded first, to minimize temporary memory usage. As memory is sized for peak usage, this cuts content node size requirements for applications with large variations in attribute size. Applications should keep memory at less than 80% of AWS EC2 instance size.

Feed Performance Improvements

At times, batches of documents are deleted. This subsequently triggers compaction. Since Vespa-7.227.2, compaction is blocked at high removal rates, reducing overall load. Compaction resumes once the remove rate is low again. 

Increased Tensor Performance 

Tensor is a field type used in advanced ranking expressions, with heavy CPU usage. Simple tensor joins are now optimized and more optimizations will follow in June.

About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

We welcome your contributions and feedback (tweetoremail) about any of these new features or future improvements you’d like to request.

Kristian Aune, Tech Product Manager, Verizon Media

In the previous update, we mentioned Ranking with LightGBM Models, Matrix Multiplication Performance, Benchmarking Guide, Query Builder and Hadoop Integration. This month, we’re excited to share the following updates:

Improved Performance for Large Fan-out Applications

Vespa container nodes execute queries by fanning out to a set of content nodes evaluating parts of the data in parallel. When fan-out or partial results from each node is large, this can cause bandwidth to run out. Vespa now provides an optimization which lets you control the tradeoff between the size of the partial results vs. the probability of getting a 100% global result. As this works out, tolerating a small probability of less than 100% correctness gives a large reduction in network usage. Read more.

Improved Node Auto-fail Handling

Whenever content nodes fail, data is auto-migrated to other nodes. This consumes resources on both sender and receiver nodes, competing with resources used for processing client operations. Starting with Vespa-7.197, we have improved operation and thread scheduling, which reduces the impact on client document API operation latencies when a node is under heavy migration load.

CloudWatch Metric Import

Vespa metrics can now be pushed or pulled into AWS CloudWatch. Read more in monitoring

CentOS 7 Dev Environment

Adevelopment environment for Vespa on CentOS 7 is now available. This ensures that the turnaround time between code changes and running unit tests and system tests is short, and makes it easier to contribute to Vespa.

About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

We welcome your contributions and feedback (tweetoremail) about any of these new features or future improvements you’d like to request.

Kristian Aune, Tech Product Manager, Verizon Media

After being made aware of the COVID-19 Open Research Dataset Challenge (CORD-19), where AI experts have been asked to create text and data mining tools that can help the medical community, the Vespa team wanted to contribute. 

Given our experience with big data at Yahoo (now Verizon Media) and creating Vespa (open source big data serving engine), we thought the best way to help was to index the dataset, which includes over 44,000 scholarly articles, and to make it available for searching via Vespa Cloud.

Now live at https://cord19.vespa.ai, you can get started with a few of the sample queries or for more advanced queries, visit CORD-19 API Query. Feel free to tweet us @vespaengineorsubmit an issue, if you have any questions or suggestions.

Please expect daily updates to the documentation and query features. Contributions are appreciated - please refer to our contributing guide and submit PRs. You can also download the application, index the data set, and improve the service. More info here on how to run Vespa.ai on your own computer. 

Kristian Aune, Tech Product Manager, Verizon Media

In the January Vespa product update, we mentioned Tensor Operations, New Sizing Guides, Performance Improvements for Matched Elements in Map/Array-of-Struct, and Boolean Query Optimizations. This month, we’re excited to share the following updates:

Ranking with LightGBM Models

Vespa now supports LightGBM machine learning models in addition to ONNX, Tensorflow and XGBoost. LightGBM is a gradient boosting framework that trains fast, has a small memory footprint, and provides similar or improved accuracy to XGBoost. LightGBM also supports categorical features.

Matrix Multiplication Performance

Vespa now uses OpenBLAS for matrix multiplication, which improves performance in machine-learned models using matrix multiplication.

Benchmarking Guide

Teams use Vespa to implement applications with strict latency requirements and minimal cost. In January, we released a new sizing guide. This month, we’re adding a benchmarking guide that you can use to find the perfect spot between cost and performance.

Query Builder

Thanks to contributions from yehzu, Vespa now has a fluent library for composing queries - explore the client module for details.

Hadoop Integration

Vespa is integrated with Hadoop and easy to feed from a grid. The grid integration now also supports conditional writes, see #12081

We welcome your contributions and feedback (tweetoremail) about any of these new features or future improvements you’d like to request.

About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.

The recent advancement in deep reinforcement learning (RL) enables solving complex high-dimensional problems in robotics. Nevertheless, effectively training

Novel view synthesis extrapolates a scene to a different camera viewpoint, while video prediction extrapolates to a future

loading