#machine learning
I’m Not Afraid of AI Overlords— I’m Afraid of Whoever’s Training Them To Think That Way
by Damien P. Williams
I want to let you in on a secret: According to Silicon Valley’s AI’s, I’m not human.
Well, maybe they think I’m human, but they don’t think I’m me. Or, if they think I’m me and that I’m human, they think I don’t deserve expensive medical care. Or that I pose a higher risk of criminal recidivism. Or that my fidgeting behaviours or culturally-perpetuated shame about my living situation or my race mean I’m more likely to be cheating on a test. Or that I want to see morally repugnant posts that my friends have commented on to call morally repugnant. Or that I shouldn’t be given a home loan or a job interview or the benefits I need to stay alive.
Now, to be clear, “AI” is a misnomer, for several reasons, but we don’t have time, here, to really dig into all the thorny discussion of values and beliefs about what it means to think, or to be a mind— especially because we need to take our time talking about why values and beliefs matter to conversations about “AI,” at all. So instead of “AI,” let’s talk specifically about algorithms, and machine learning.
Machine Learning (ML) is the name for a set of techniques for systematically reinforcing patterns, expectations, and desired outcomes in various computer systems. These techniques allow those systems to make sought after predictions based on the datasets they’re trained on. ML systems learn the patterns in these datasets and then extrapolate them to model a range of statistical likelihoods of future outcomes.
Algorithms are sets of instructions which, when run, perform functions such as searching, matching, sorting, and feeding the outputs of any of those processes back in on themselves, so that a system can learn from and refine itself. This feedback loop is what allows algorithmic machine learning systems to provide carefully curated search responses or newsfeed arrangements or facial recognition results to consumers like me and you and your friends and family and the police and the military. And while there are many different types of algorithms which can be used for the above purposes, they all remain sets of encoded instructions to perform a function.
And so, in these systems’ defense, it’s no surprise that they think the way they do: That’s exactly how we’ve told them to think.
[Image of Michael Emerson as Harold Finch, in season 2, episode 1 of the show Person of Interest, “The Contingency.” His face is framed by a box of dashed yellow lines, the words “Admin” to the top right, and “Day 1” in the lower right corner.]
Read the rest of I’m Not Afraid of AI Overlords— I’m Afraid of Whoever’s Training Them To Think That WayatA Future Worth Thinking About
Sea Shanty Surrealism
I’ve been working with an image-generating algorithm by Vadim Epstein called CLIP+FFT, which uses OpenAI’s CLIP algorithm to judge whether images match a given caption, and an FFT algorithm to come up with new images to present to CLIP. Give it any random phrase, and CLIP+FFT will try its best to come up with a matching image. And now there’s a version that will generate images to go with several phrases in a row and then fuse them into a video.
Here’s the sea shanty The Wellerman, sung by Nathan Evans, Jonny Stewart, and others, and illustrated by CLIP+FFT.
Now, there are several interesting things going on here, once you get past the sheer AI fever dream horror of it. One thing you’ll notice is that I changed some of the lines from the standard lyrics. CLIP+FFT deals with each line independently, so even if we have been talking about a ship and a whale throughout the song, the AI doesn’t know that in “when down on her a right whale bore”, the “her” refers to a ship. I made similar tweaks in one or two places.
There was nothing I could do about the line “One day, when the tonguing is done”. Trying to be more precise about the whaling sense of “tonguing” would, if anything, have made the image more horrifying.
Having none of the “Wellerman is a ship” context, the AI interprets The Wellerman itself as some kind of eldritch oil well drilling supervillain.
I kind of like what happened to “The winds blew hard, her bow dipped down,” with golden locks of hair and bows everywhere. I mean, I like it in a “oh no this has gone terribly yet fascinatingly wrong” sort of way.
The image for “We’ll take our leave and go” is also interesting, since it illustrates “leave” in so many ways. Sometimes there are cars and suitcases, or people shaking hands. Interestingly, I see hints of European Union flags and British flags in many of them, signs that during training CLIP was learning to associate “leave” with Brexit.
The “bully boys” are hilarious, classic glowering expressions and mean-kid haircuts. The AI is not used to the early-1900s meaning of “bully = awesome”
You’ll notice that many of the frames have text, which I find charming, as if the AI is frowning to itself and muttering “tea. tea. Billy. tea.” or “blow. blow.” The less interpretable the phrase is in image form, the more likely the AI is to use text instead.
In fact CLIP treating the word and the object as equivalent has led to an interesting way of fooling its image recognition capabilities:
I also had CLIP+FFT illustrate The Twelve Days of Christmas and this is one of my favorite frames from it: Ten Lords A-Leaping
To see the other illustrated Days of Christmas (including the weirdly human-faced swans), become a supporter of AI Weirdness! Or become a free subscriber to get new AI Weirdness posts in your inbox.
Every visual iteration of “the tonguing” is deeply unsettling. I love it.
“assorted still lives of nothing in particular”
in order, “nothing”, “a beach”, “the moon”, “an abandoned building”, “a bowl of fruit and a mirror”
“assorted still lives of nothing in particular” pt 2
in order, “a fruit and some bones”, “all quiet on the western front”, “a bouquet of colorful flowers”, “a bowl of fruit that looks like a mountain range”, “a concrete megastructure”, “an acropolis landscape”
Starts With A Bang #69 - Machine Learning In Astronomy
Starts With A Bang Podcast #69 - Machine Learning In Astronomy
When you think about how astronomy works, you probably think about observers pointing telescopes at objects, collecting data about their properties, and then analyzing that data to determine what those objects are truly like, and to infer what they can teach or show us about the Universe. But that’s a rather old-fashioned way of doing things: one that’s contingent on there being enough astronomers to examine all of that data manually. What do we do in this new era of big data in astronomy, where there aren’t enough astronomers on Earth to even look at all of the data by hand?
The way we deal with it is fascinating, and involves a mix of statistics, classical analysis and categorization, and novel techniques like machine learning and simulating mock catalogues to “train” an artificial intelligence. Perhaps the most exciting aspect is how thoroughly the best of these applications continuously outperform, in both quality and speed, any of the manual techniques we’ve used previously. Here to walk us through this exciting and emerging field of machine learning in astronomy is Sankalp Gilda, PhD candidate and astronomer from the University of Florida.
We’ve got a great 90 minutes here for you, so buckle up and enjoy the ride!
Introduction
The goal of this research is to find further evidence for the benefits of distributed mentoring. Distributed mentoring is “a kind of mentoring that is uniquely suited to networked communities, where people of all ages and experience levels engage with and support one another through a complex, interwoven tapestry of interactive, cumulatively sophisticated advice and informal instruction” [1]. This involves multiple kinds of feedback exchanged between many mentors and mentees. In this research project, we used machine learning to classify Fanfiction.net reviews by their category within distributed mentoring theory.
Earlier research in our group published in the paper ‘More Than Peer Production: Fanfiction Communities as Sites of Distributed Mentoring’ has outlined 13 categories that were observed in Fanfiction.net reviews [2]. We used shallow positive, targeted positive, targeted constructive, and targeted positive & constructive for this analysis, as they are the four mutually exclusive codes. Table 1 below provides a formal description and percentage of reviews for each of the categories [2].
Table 1: Description and Percentage of Categories (based on 4500 reviews)
(Note: percentages add up to more than 100% because a review could be in multiple categories).
An example of a shallow positive review is “Great story!”, targeted positive is “I loved the character development of James”, and a targeted constructive review is “You could have described the battle scene better!” Targeted positive & constructive reviews contains both targeted positive and targeted constructive comments.
Our overarching research question is “Do certain review categories correlate with various attributes of distributed mentoring?” For example, we want to explore whether substantive, targeted reviews improve authors’ writing. This research would be beneficial to the fanfiction community, as it would provide an outline to members of the community on how to effectively impact and interact with authors. The theory of distributed mentoring is an applicable framework to use, as it discusses the effect of networked communities. To apply this theory, we used the public reviews available in the fanfiction community. Since there are numerous types of reviews, we used the codes listed in Table 1 to classify the reviews.
To classify all Fanfiction.net reviews, roughly 177 million, we explored machine learning classification, as manual coding would be impossible. Classification is a process of predicting the review category for a given set of reviews.
Our goal for this blog was to find the best machine learning model for review classification. We could then use this model to expand our results to the entire Fanfiction.net reviews dataset. Our baseline classification tool was ALOE (Affect Labeler of Expressions), an open source tool developed to train and test machine learning classifiers to automatically label chat messages with different emotion or affect categories [3]. In addition, we attempted various algorithms such as logistic regression, support vector machines, and Naive Bayes. This blog post discusses our approach to running ALOE as well as creating each of the aforementioned machine learning models.
Dataset
To conduct machine classification, we required data to train the model to learn how reviews relate to a certain category. We leveraged a dataset manually classified by previous participants in the UW Human-Centered Data Science Lab research group. Our dataset contained ~8000 manually classified reviews.
Method
The measures of success for performance were accuracy, precision, and recall. Accuracy is the number of correct predictions. This measure, however, can be misleading in classification problems. In the field of data science, we call a positive value true and a negative value false. In this case the value is positive if the review corresponds to the category in question and false otherwise. For example, if a dataset has 99 positive data points and 1 negative data points, a model that predicts only positive would receive a 0.99 accuracy. Therefore, we also used precision and recall to provide a holistic perspective. Precision is ‘how many negative data points did I include in my list of positively predicted examples’, and recall is ‘how many positive data points did I miss’. An average range for precision and recall is 0.6 - 0.7. Anything below 0.6 may signify that the results are not valid while and anything above 0.7 is generally considered a really good score that validates our accuracy.
Figure 1: Image from Wikipedia visually describing Precision and Recall
1. ALOEWe were able to run ALOE by following the documentation at https://github.com/etcgroup/aloe.
2. Other Classifiers
2.1 Logistic Regression
Logistic Regression is a method commonly used when the inputs of the model are categories. We experimented with multiple different parameters and sought a set of parameters that yield the best result from the model.
2.2 Naive Bayes
Naive Bayes is a family of machine learning based on applying Bayes’ theorem to calculate certain probabilities. We explored 3 types of Naive Bayes classifiers on the four categories of data. These were the Gaussian, Bernoulli and Multinomial Naive Bayes methods.
2.3 Support Vector Machine (SVM)
SVM is a method to find the best division between two classes. We explored three different SVM models: default, linear, and optimal. We used a technique to find the best parameters for each of these models.
When using the four categories defined above, we received low precision and recall scores for targeted constructive and targeted positive & constructive. Hence we decided to combine the three targeted categories in order to solidify our results. This is because there are very few reviews in the dataset for the latter two categories, and all targeted categories qualify as “substantive” since they provide specific feedback to authors. Consequently, we decided to add the update encouragement category, as 27.6% of our dataset is classified as this code. Update encouragement is a category that represents all reviews that encourage the author to write more [2]. These changes enable a more accurate comparison between the various models.
Results
After these changes, we got the following results for our models on shallow positive, targeted, and update encouragement. All values are representative of percentages, from a scale from 0 to 1.
Conclusion
We will expand these results by classifying the entire Fanfiction.net dataset, 177 million reviews, by using Optimal SVM to predict shallow positive and update encouragement reviews and ALOE to predict targeted reviews. After which, we plan to proceed with our analysis between these review categories and attributes of distributed mentoring such as improvement of writing and participation rate. As a starting point, we will explore whether targeted reviews impact authors’ lexical diversity - which is an indicator of improvement in the authors’ writing and a learning gain from online informal learning. Additionally, we will brainstorm other metrics to measure learning and distributed mentoring. Overall, we are delighted that our changes gave positive results and were able to create models that performed better than our baseline, ALOE. A better model means we can more accurately classify reviews and expand our results to provide a blueprint to the fanfiction community on how to effectively impact and interact with authors.
Citations
- Aragon C. Human-Centered Data Science Lab » Distributed Mentoring in Fanfiction Communities Human-Centered Data Science Lab. Depts.washington.edu. https://depts.washington.edu/hdsl/research/distributed-mentoring/. Published 2019. Accessed June 5, 2019.
- Evans, S., Davis, K., Evans, A., Campbell, J. A., Randall, D. P., Yin, K., & Aragon, C. (2017, February). More than peer production: fanfiction communities as sites of distributed mentoring. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing(pp. 259-272). ACM.
- Brooks M. etcgroup/aloe. GitHub. https://github.com/etcgroup/aloe.
Hello fanfic friends! In the past few months, our research group has done several research projects about fanfiction communities. We have written our research results into a collection of blog posts. Starting this week, we are going to post once a week to share our findings. Stay turned for more posts!!!
Introduction
The goal of this research is to find further evidence for the benefits of distributed mentoring. Distributed mentoring is “a kind of mentoring that is uniquely suited to networked communities, where people of all ages and experience levels engage with and support one another through a complex, interwoven tapestry of interactive, cumulatively sophisticated advice and informal instruction” [1]. This involves multiple kinds of feedback exchanged between many mentors and mentees. In this research project, we used machine learning to classify Fanfiction.net reviews by their category within distributed mentoring theory.
Earlier research in our group published in the paper ‘More Than Peer Production: Fanfiction Communities as Sites of Distributed Mentoring’ has outlined 13 categories that were observed in Fanfiction.net reviews [2]. We used shallow positive, targeted positive, targeted constructive, and targeted positive & constructive for this analysis, as they are the four mutually exclusive codes. Table 1 below provides a formal description and percentage of reviews for each of the categories [2].
Table 1: Description and Percentage of Categories (based on 4500 reviews)
(Note: percentages add up to more than 100% because a review could be in multiple categories).
An example of a shallow positive review is “Great story!”, targeted positive is “I loved the character development of James”, and a targeted constructive review is “You could have described the battle scene better!” Targeted positive & constructive reviews contains both targeted positive and targeted constructive comments.
Our overarching research question is “Do certain review categories correlate with various attributes of distributed mentoring?” For example, we want to explore whether substantive, targeted reviews improve authors’ writing. This research would be beneficial to the fanfiction community, as it would provide an outline to members of the community on how to effectively impact and interact with authors. The theory of distributed mentoring is an applicable framework to use, as it discusses the effect of networked communities. To apply this theory, we used the public reviews available in the fanfiction community. Since there are numerous types of reviews, we used the codes listed in Table 1 to classify the reviews.
To classify all Fanfiction.net reviews, roughly 177 million, we explored machine learning classification, as manual coding would be impossible. Classification is a process of predicting the review category for a given set of reviews.
Our goal for this blog was to find the best machine learning model for review classification. We could then use this model to expand our results to the entire Fanfiction.net reviews dataset. Our baseline classification tool was ALOE (Affect Labeler of Expressions), an open source tool developed to train and test machine learning classifiers to automatically label chat messages with different emotion or affect categories [3]. In addition, we attempted various algorithms such as logistic regression, support vector machines, and Naive Bayes. This blog post discusses our approach to running ALOE as well as creating each of the aforementioned machine learning models.
Dataset
To conduct machine classification, we required data to train the model to learn how reviews relate to a certain category. We leveraged a dataset manually classified by previous participants in the UW Human-Centered Data Science Lab research group. Our dataset contained ~8000 manually classified reviews.
Method
The measures of success for performance were accuracy, precision, and recall. Accuracy is the number of correct predictions. This measure, however, can be misleading in classification problems. In the field of data science, we call a positive value true and a negative value false. In this case the value is positive if the review corresponds to the category in question and false otherwise. For example, if a dataset has 99 positive data points and 1 negative data points, a model that predicts only positive would receive a 0.99 accuracy. Therefore, we also used precision and recall to provide a holistic perspective. Precision is ‘how many negative data points did I include in my list of positively predicted examples’, and recall is ‘how many positive data points did I miss’. An average range for precision and recall is 0.6 - 0.7. Anything below 0.6 may signify that the results are not valid while and anything above 0.7 is generally considered a really good score that validates our accuracy.
Figure 1: Image from Wikipedia visually describing Precision and Recall
1. ALOE
We were able to run ALOE by following the documentation at https://github.com/etcgroup/aloe.
2. Other Classifiers
2.1 Logistic Regression
Logistic Regression is a method commonly used when the inputs of the model are categories. We experimented with multiple different parameters and sought a set of parameters that yield the best result from the model.
2.2 Naive Bayes
Naive Bayes is a family of machine learning based on applying Bayes’ theorem to calculate certain probabilities. We explored 3 types of Naive Bayes classifiers on the four categories of data. These were the Gaussian, Bernoulli and Multinomial Naive Bayes methods.
2.3 Support Vector Machine (SVM)
SVM is a method to find the best division between two classes. We explored three different SVM models: default, linear, and optimal. We used a technique to find the best parameters for each of these models.
When using the four categories defined above, we received low precision and recall scores for targeted constructive and targeted positive & constructive. Hence we decided to combine the three targeted categories in order to solidify our results. This is because there are very few reviews in the dataset for the latter two categories, and all targeted categories qualify as “substantive” since they provide specific feedback to authors. Consequently, we decided to add the update encouragement category, as 27.6% of our dataset is classified as this code. Update encouragement is a category that represents all reviews that encourage the author to write more [2]. These changes enable a more accurate comparison between the various models.
Results
After these changes, we got the following results for our models on shallow positive, targeted, and update encouragement. All values are representative of percentages, from a scale from 0 to 1.
Conclusion
We will expand these results by classifying the entire Fanfiction.net dataset, 177 million reviews, by using Optimal SVM to predict shallow positive and update encouragement reviews and ALOE to predict targeted reviews. After which, we plan to proceed with our analysis between these review categories and attributes of distributed mentoring such as improvement of writing and participation rate. As a starting point, we will explore whether targeted reviews impact authors’ lexical diversity - which is an indicator of improvement in the authors’ writing and a learning gain from online informal learning. Additionally, we will brainstorm other metrics to measure learning and distributed mentoring. Overall, we are delighted that our changes gave positive results and were able to create models that performed better than our baseline, ALOE. A better model means we can more accurately classify reviews and expand our results to provide a blueprint to the fanfiction community on how to effectively impact and interact with authors.
Citations
- Aragon C. Human-Centered Data Science Lab » Distributed Mentoring in Fanfiction Communities Human-Centered Data Science Lab. Depts.washington.edu. https://depts.washington.edu/hdsl/research/distributed-mentoring/. Published 2019. Accessed June 5, 2019.
- Evans, S., Davis, K., Evans, A., Campbell, J. A., Randall, D. P., Yin, K., & Aragon, C. (2017, February). More than peer production: fanfiction communities as sites of distributed mentoring. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing(pp. 259-272). ACM.
- Brooks M. etcgroup/aloe. GitHub. https://github.com/etcgroup/aloe.
Pattern recognition
The First-Person Industrial Complex by Laura Bennett
The Internet prizes the harrowing personal essay. But sometimes telling your story comes with a price.
This is a key problem with the new first-person economy: the way it incentivizes knee-jerk, ideally topical self-exposure, the hot take’s more intimate sibling. The mandate at xoJane, according to Carroll, was: the more “shameless” an essay, the better. Carroll describes how “internally crushing” it became to watch her inbox get flooded every day with the darkest moments in strangers’ lives: “eating disorders, sexual assault, harassment, ‘My boyfriend’s a racist and I just realized it.’ ” After a while, Carroll said, the pitches began to sound as if they were all written in the same voice: “immature, sort of boastful.” Tolentino, who worked as an editor at the HairpinbeforeJezebel, characterizes the typical Jezebel pitch as the “microaggression personal essay” or “My bikini waxer looked at me funny and here’s what it says about women’s shame,” and the typical Hairpin pitch as “I just moved to the big city and had a beautiful coffee shop encounter, and here’s what it says about urban life.”
It’s harder than ever to weigh the ethics of publishing these pieces against the market forces that demand them, especially as new traffic analytics make it easy to write and edit with metrics in mind. “I’ve always loved unvarnished, almost performative, extemporaneous bloggy writing,” Gould says. “But now an editor will be like, can you take this trending topic and make it be about you?” Sarah Hepola, who edits Salon’s personal essays, says that the question “What am I doing to these writers?” is always in the back of her mind: “I try to warn them that their Internet trail will be ‘I was a BDSM person,’ and they did it for $150.” But editors’ best efforts aside, this is, more than anything, a labor problem—writers toiling at the whims of a system with hazardous working conditions that involve being paid next to nothing and guaranteed a lifetime of SEO infamy. The first-person boom, Tolentino says, has helped create “a situation in which writers feel like the best thing they have to offer is the worst thing that ever happened to them.”
I really enjoyed reading the reddit comments on this piece. /u/walker6168 criticized the article, writing, “I felt like the author didn’t want to go the extra mile. It doesn’t quite condemn the practice for being exploitative and taking advantage of people who have had terrible experiences. It doesn’t address the huge risk that comes with the format: verifying the story, like with the Rolling Stones UVA article. Nor does it really engage with the format’s desire to distort every tragedy into a politically correct format.” /u/smeethu countered, “I agree that the author didn’t go all the way and condemn the practice, but she still went into enough depth to make me explore its nuances. What I find is that these people are being exploited, but they are also exploiting themselves. If you are a starving freelance writer who is behind on rent, you know you need to get paid. Writing a shocking personal essay is one way to guarantee that. And it sells for the same reason people tune in to reality TV: we enjoy exploring the dark parts of our lives and it’s entertaining.” I feel like that argument is also used for other exploitive practices, like factories and sweatshops (i.e. the people who work there are happy to have found work at all). I think the way our society is structured encourages exploitation through commodification. We’re commodifying people’s experiences and are meant to feel okay about it because they’re supposed to speak to some universally relatable theme.
On a similar note, /u/DevFRus wrote, “At what point do such first-person essays stop being empowering and become a circus side-show? It seems to me like it is becoming less and less about giving people who had no voice before a voice, and more and more about exploiting those people for clicks. I wish the author engaged more critically with these aspects of the industry.” I think the question of when things stop being empowering is really important. It may feel empowering for someone to bare their heart in the moment, but does that mean true consent when the underlying system is exploitive? It may feel empowering for a woman to dress in provocative clothing, but is that truly making a statement in a culture steeped in compulsory sexuality and the sexual objectification of female bodies? When does the individual need to step back and consider the system rather than individual empowerment?
How big data is unfair by Moritz Hardt
Understanding sources of unfairness in data driven decision making
As we’re on the cusp of using machine learning for rendering basically all kinds of consequential decisions about human beings in domains such as education, employment, advertising, health care and policing, it is important to understand why machine learning is not, by default, fair or just in any meaningful way.
This runs counter to the widespread misbelief that algorithmic decisions tend to be fair, because, y’know, math is about equations and not skin color. […] I’d like to refute the claim that “machine learning is fair by default”. I don’t mean to suggest that machine learning is inevitably unfair, but rather that there are powerful forces that can render decision making that depends on learning algorithms unfair.
[…] a learning algorithm is designed to pick up statistical patterns in training data. If the training data reflect existing social biases against a minority, the algorithm is likely to incorporate these biases. This can lead to less advantageous decisions for members of these minority groups. Some might object that the classifier couldn’t possibly be biased if nothing in the feature space speaks of the protected attributed, e.g., race. This argument is invalid. After all, the whole appeal of machine learning is that we can infer absent attributes from those that are present. Race and gender, for example, are typically redundantly encoded in any sufficiently rich feature space whether they are explicitly present or not. They are latent in the observed attributes and nothing prevents the learning algorithm from discovering these encodings. In fact, when the protected attribute is correlated with a particular classification outcome, this is precisely what we should expect. There is no principled way to tell at which point such a correlation is worrisome and in what cases it is acceptable.
My knee-jerk reaction when reading the article title was, “What? How can an algorithm be unfair?” It’s interesting to have forgotten about the inherent biases in the data itself.
Verge Fiction: The Date by Emily Yoshida
The kid couldn’t have been older than 24, but there was a deep, distant fatigue to his face, and dark shadows lined his eyes. As he stared down at the tablet his face went slack, as if momentarily hypnotized by its glow. He took a sip of Red Bull Yellow Edition and handed the tablet back to me, this time with a new document labeled STUDY OUTLINE.
“So if you read through that, you’ll get the basic gist of it,” he said matter-of-factly. “Basically, you’re going to be contacted by a number of brands over the duration of the test period, and you’re to react as you normally would; you’re free to ignore them, or take advantage of whatever offers or promotions they have going on. Totally up to you. These may show up on email, Facebook, any social network you’ve provided us with — and as you’ll see in the release form in a second, you do get compensated more for every account you sign over to us. At the end of the study you’ll be asked to report how many brands contacted you, and we’ll check it against our own records. There is also a possibility that you will be a placebo subject — that no brands will contact you.”
[…] By the time I walked out the door I had had enough Pinot Grigio in me to feel sufficiently light on my feet about this whole adventure. All right, this is what you are doing now, I kept repeating in my head. You are in the world and you are letting yourself be changed by it, and that is normal and fun. The Jam Cellar was walking distance to my apartment, and as I made my way down there I listened to a playlist I had made for myself on Apple Music on my new fancy wireless headphones.
Every fifth step I felt my heart wobble a little as I remembered the picture of Marcus and that corgi. He had two other photos that I had stared at in between our chats — one of him sitting at a brunch spot drinking some kind of complicated looking cocktail out of a hammered copper mug, the other of him at the beach during sunset, in silhouette from behind as he ran toward the water. You couldn’t even see his face. He was willing to use a whole picture slot for something that didn’t even show his face. I liked that.
A terrifying, if a bit hokey, glimpse at the role of brands in our lives.
Kristian Aune, Tech Product Manager, Verizon Media
In the previous update, we mentioned Improved Slow Node Tolerance, Multi-Threaded Rank Profile Compilation, Reduced Peak Memory at Startup, Feed Performance Improvements, and Increased Tensor Performance. This month, we’re excited to share the following updates:
Support for Approximate Nearest Neighbor Vector Search
Vespa now supports approximate nearest neighbor search which can be combined with filters and text search. By using a native implementation of the HNSW algorithm, Vespa provides state of the art performance on vector search: Typical single digit millisecond response time, searching hundreds of millions of documents per node, but also uniquely allows vector query operators to be combined efficiently with filters and text search - which is usually a requirement for real-world applications such as text search and recommendation. Vectors can be updated in real-time with a sustained write rate of a few thousand vectors per node per second. Read more in the documentation on nearest neighbor search.
Streaming Search Speedup
Streaming Search is a feature unique to Vespa. It is optimized for use cases like personal search and e-mail search - but is also useful in high-write applications querying a fraction of the total data set. With #13508, read throughput from storage increased up to 5x due to better parallelism.
Rank Features
- The (Native)fieldMatch rank features are optimized to use less CPU query time, improving query latency for Text Matching and Ranking.
- The new globalSequence rank feature is an inexpensive global ordering of documents in a system with stable system state. For a system where node indexes change, this is inaccurate. See globalSequence documentation for alternatives.
GKE Sample Application
Thank you to Thomas Griseau for contributing a new sample application for Vespa on GKE, which is a great way to start using Vespa on Kubernetes.
…
About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.
We welcome your contributions and feedback (tweetoremail) about any of these new features or future improvements you’d like to request.
Kristian Aune, Tech Product Manager, Verizon Media
In the April updates, we mentioned Improved Performance for Large Fan-out Applications, Improved Node Auto-fail Handling, CloudWatch Metric Import and CentOS 7 Dev Environment. This month, we’re excited to share the following updates:
Improved Slow Node Tolerance
To improve query scaling, applications can group content nodes to balance static and dynamic query cost. The largest Vespa applications use a few hundred nodes. This is a great feature to optimize cost vs performance in high-query applications. Since Vespa-7.225.71, the adaptive dispatch policy is made default. This balances load to the node groups based on latency rather than just round robin - a slower node will get less load and overall latency is lower.
Multi-Threaded Rank Profile Compilation
Queries are using a rank profile to score documents. Rank profiles can be huge, like machine learned models. The models are compiled and validated when deployed to Vespa. Since Vespa-7.225.71, the compilation is multi-threaded, cutting compile time to 10% for large models. This makes content node startup quicker, which is important for rolling upgrades.
Reduced Peak Memory at Startup
Attributes is a unique Vespa feature used for high feed performance for low-latency applications. It enables writing directly to memory for immediate serving. At restart, these structures are reloaded. Since Vespa-7.225.71, the largest attribute is loaded first, to minimize temporary memory usage. As memory is sized for peak usage, this cuts content node size requirements for applications with large variations in attribute size. Applications should keep memory at less than 80% of AWS EC2 instance size.
Feed Performance Improvements
At times, batches of documents are deleted. This subsequently triggers compaction. Since Vespa-7.227.2, compaction is blocked at high removal rates, reducing overall load. Compaction resumes once the remove rate is low again.
Increased Tensor Performance
Tensor is a field type used in advanced ranking expressions, with heavy CPU usage. Simple tensor joins are now optimized and more optimizations will follow in June.
…
About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.
We welcome your contributions and feedback (tweetoremail) about any of these new features or future improvements you’d like to request.
Kristian Aune, Tech Product Manager, Verizon Media
In the previous update, we mentioned Ranking with LightGBM Models, Matrix Multiplication Performance, Benchmarking Guide, Query Builder and Hadoop Integration. This month, we’re excited to share the following updates:
Improved Performance for Large Fan-out Applications
Vespa container nodes execute queries by fanning out to a set of content nodes evaluating parts of the data in parallel. When fan-out or partial results from each node is large, this can cause bandwidth to run out. Vespa now provides an optimization which lets you control the tradeoff between the size of the partial results vs. the probability of getting a 100% global result. As this works out, tolerating a small probability of less than 100% correctness gives a large reduction in network usage. Read more.
Improved Node Auto-fail Handling
Whenever content nodes fail, data is auto-migrated to other nodes. This consumes resources on both sender and receiver nodes, competing with resources used for processing client operations. Starting with Vespa-7.197, we have improved operation and thread scheduling, which reduces the impact on client document API operation latencies when a node is under heavy migration load.
CloudWatch Metric Import
Vespa metrics can now be pushed or pulled into AWS CloudWatch. Read more in monitoring.
CentOS 7 Dev Environment
Adevelopment environment for Vespa on CentOS 7 is now available. This ensures that the turnaround time between code changes and running unit tests and system tests is short, and makes it easier to contribute to Vespa.
About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.
We welcome your contributions and feedback (tweetoremail) about any of these new features or future improvements you’d like to request.
Kristian Aune, Tech Product Manager, Verizon Media
After being made aware of the COVID-19 Open Research Dataset Challenge (CORD-19), where AI experts have been asked to create text and data mining tools that can help the medical community, the Vespa team wanted to contribute.
Given our experience with big data at Yahoo (now Verizon Media) and creating Vespa (open source big data serving engine), we thought the best way to help was to index the dataset, which includes over 44,000 scholarly articles, and to make it available for searching via Vespa Cloud.
Now live at https://cord19.vespa.ai, you can get started with a few of the sample queries or for more advanced queries, visit CORD-19 API Query. Feel free to tweet us @vespaengineorsubmit an issue, if you have any questions or suggestions.
Please expect daily updates to the documentation and query features. Contributions are appreciated - please refer to our contributing guide and submit PRs. You can also download the application, index the data set, and improve the service. More info here on how to run Vespa.ai on your own computer.
Kristian Aune, Tech Product Manager, Verizon Media
In the January Vespa product update, we mentioned Tensor Operations, New Sizing Guides, Performance Improvements for Matched Elements in Map/Array-of-Struct, and Boolean Query Optimizations. This month, we’re excited to share the following updates:
Ranking with LightGBM Models
Vespa now supports LightGBM machine learning models in addition to ONNX, Tensorflow and XGBoost. LightGBM is a gradient boosting framework that trains fast, has a small memory footprint, and provides similar or improved accuracy to XGBoost. LightGBM also supports categorical features.
Matrix Multiplication Performance
Vespa now uses OpenBLAS for matrix multiplication, which improves performance in machine-learned models using matrix multiplication.
Benchmarking Guide
Teams use Vespa to implement applications with strict latency requirements and minimal cost. In January, we released a new sizing guide. This month, we’re adding a benchmarking guide that you can use to find the perfect spot between cost and performance.
Query Builder
Thanks to contributions from yehzu, Vespa now has a fluent library for composing queries - explore the client module for details.
Hadoop Integration
Vespa is integrated with Hadoop and easy to feed from a grid. The grid integration now also supports conditional writes, see #12081.
We welcome your contributions and feedback (tweetoremail) about any of these new features or future improvements you’d like to request.
About Vespa: Largely developed by Yahoo engineers, Vespa is an open source big data processing and serving engine. It’s in use by many products, such as Yahoo News, Yahoo Sports, Yahoo Finance, and the Verizon Media Ad Platform. Thanks to feedback and contributions from the community, Vespa continues to grow.
Understanding and manipulating articulated objects such as doors and drawers is a key skill for robots in human
The recent advancement in deep reinforcement learning (RL) enables solving complex high-dimensional problems in robotics. Nevertheless, effectively training
Novel view synthesis extrapolates a scene to a different camera viewpoint, while video prediction extrapolates to a future
A robot ‘chef’ has been trained to taste food at different stages of the chewing process to assess
Substance abuse is a serious health issue, which requires immediate attention. It brings a lot of suffering to