Something that I really like about the Brave Browser is that the built-in AI in the sidebar/toolbar can read the content in the the web sites that I'm viewing in the browser, and it can answer questions about the site. As a result of that capability, I've been experimenting with prompts that could help with curation decisions.
Certainly, the idea is not to replace my own decision-making with an AI, but rather I want to get an independent perspective - especially with regards to things like reach, novelty, and relevance. It's an adaptation of the Four eyes principle.
So, today, I thought I'd compare how the AI evaluated a series of posts and report on the results. I would have preferred to use the "Claude 3 Haiku" LLM in the browser settings, but I ran out of free AI time too fast, so most of the results are with "Llama 3.1 8B". I also did one evaluation with Mixtral, just for comparison purposes (but again, I ran out of free AI time).
Here's today's version of a curation prompt:
Evaluate this article in four dimensions and score each dimension from 1-10. The dimensions are:
- Novelty/surprise: Does the article provide unique information or perspective?
- Organization: Is the article structured in a way that motivates the reader to read the whole thing?
- Grammar/spelling: Are there any mistakes?
- Relevance: Is the topic relevant to a wide audience?
Also, provide an overall score from 0-10 that considers those factors and any others that you think are relevant.
Finally, I'm going to vote on 20 articles today with an average percentage of 50%. From 0-100, what percentage should I allocate to this article?
Explain your scoring.
Note that this prompt doesn't consider curation rewards at all. An improved version might explain the curation rules and have it include the potential rewards in its response.
Here's what Brave's AIs think of some posts on the Steem blockchain. Five of six posts are mine, and another anonymous trending post was a late addition.
Model | Link | Novelty | Organization | Grammar & Spelling | Relevance | Overall | Suggested Vote |
---|---|---|---|---|---|---|---|
Llama | Make Downvotes Great Again? | 6 | 8 | 9 | 7 | 7.5 | 60 |
Llama | My selections for Developer Delegation Day (DDD) | 6 | 8 | 9 | 7 | 7.5 | 60 |
Llama | True crime in the Delaware Valley: A cover-up in the case of Ellen Greenberg? | 8 | 9 | 9 | 6 | 7.8 | 60 |
Llama | New technique makes skin invisible - yes really | 8 | 9 | 10 | 8 | 8.3 | 55 |
Llama | Community Service Announcement: Pinned posts by mod-bot | 6 | 8 | 10 | 8 | 7.4 | 60 |
Mixtral | Community Service Announcement: Pinned posts by mod-bot | 7 | 8 | 9 | 7 | 31/40 | 55-60 |
Llama | Anonymous trending article | 2 | 4 | 9 | 6 | 4.5 | 20-25 |
Here's what those scores look like:
Here's what it looks like in practice, with both the article and the AI interaction in the same browser environment.
Here are the details of the responses, followed by a short wrap-up.
Dimension 1: Novelty/surprise Score: 6/10 The article presents a unique perspective on downvoting on Steem, suggesting a new approach to using the tag "#thegauntlet" to encourage accurate appraisal of content. However, the idea is not entirely new, as it is based on a concept mentioned in the past.
Dimension 2: Organization Score: 8/10 The article is well-structured, with a clear introduction, explanation of the problem, presentation of a solution, and discussion of potential objections. The author effectively uses headings and paragraphs to break up the content and make it easier to follow.
Dimension 3: Grammar/spelling Score: 9/10 The article appears to be well-written, with no major grammatical errors or spelling mistakes. However, there are a few minor errors, such as the use of American spell-checker corrections.
Dimension 4: Relevance Score: 7/10 The topic of downvoting on Steem is relevant to a specific audience, namely Steem users and content creators. However, the article may not be of interest to a wider audience.
Overall score Score: 7.5/10 The article presents a unique perspective on downvoting, is well-structured, and is free of major errors. However, the topic may not be of interest to a wide audience, and the idea is not entirely new.
Percentage allocation Score: 60% Based on the article's content and relevance, I would recommend allocating 60% of your voting percentage to this article.
Dimension 1: Novelty/Surprise Score: 6/10 The article provides a unique perspective on the Steem ecosystem and the concept of Developer Delegation Day. However, the idea of delegating STEEM Power to developers is not entirely new.
Dimension 2: Organization Score: 8/10 The article is well-structured, with a clear introduction, explanation of the concept, and a list of the author's delegations. The use of brief highlights for each developer is effective in conveying the author's reasoning.
Dimension 3: Grammar/Spelling Score: 9/10 The article appears to be well-written, with no major grammatical or spelling errors. However, there are a few minor errors, such as the use of a smiley face in the text.
Dimension 4: Relevance Score: 7/10 The topic of the article is relevant to a niche audience interested in the Steem ecosystem and blockchain development. However, the article may not be of interest to a wider audience.
Overall Score Score: 7.5/10 The article is well-written and provides a unique perspective on the Steem ecosystem. However, its relevance and novelty are limited to a specific audience.
Percentage Allocation Score: 60% Based on the article's quality and relevance, I would recommend allocating 60% of your vote to this article. This is a moderate score, reflecting the article's strengths and weaknesses.
Dimension 1: Novelty/Surprise Score: 8/10 The article provides a detailed and unique perspective on a specific true crime case, which is not commonly discussed. The author's personal involvement and updates on the case make it engaging and informative.
Dimension 2: Organization Score: 9/10 The article is well-structured, with a clear introduction, background information, and updates on the case. The author provides links to previous articles and sources, making it easy to follow the story.
Dimension 3: Grammar/Spelling Score: 9/10 The article appears to be well-written, with no noticeable grammatical or spelling errors.
Dimension 4: Relevance Score: 6/10 While the topic is specific, it may not be relevant to a wide audience. However, true crime enthusiasts and those interested in justice and law may find it engaging.
Overall Score Score: 7.8/10 The article's unique perspective, clear organization, and well-written content make it a strong contender. However, its limited relevance to a wide audience and some potential biases (e.g., the author's personal involvement) prevent it from scoring higher.
Percentage Allocation Score: 60% Considering the article's strengths and weaknesses, I would recommend allocating 60% of your vote to this article. This score reflects its unique perspective, clear organization, and well-written content, while also acknowledging its limited relevance to a wide audience.
Dimension 1: Novelty/surprise Score: 8/10 The article presents a unique and surprising breakthrough in making mouse skin transparent using Yellow Dye #5, which has the potential to revolutionize medical diagnostics and procedures.
Dimension 2: Organization Score: 9/10 The article is well-structured, with a clear introduction, explanation of the research, and potential applications. The use of subheadings and quotes from experts adds to the clarity and readability.
Dimension 3: Grammar/spelling Score: 10/10 The article appears to be free of grammatical and spelling errors.
Dimension 4: Relevance Score: 8/10 The topic is relevant to a wide audience interested in science, technology, and medical advancements. However, it may not be of interest to readers who are not familiar with scientific research or medical applications.
Overall score Score: 8.3/10 The article's unique information, clear organization, and lack of errors make it a well-written and engaging piece. However, its relevance to a wider audience is somewhat limited.
Percentage allocation Percentage: 55% Considering the article's novelty, organization, and relevance, I would recommend allocating a higher percentage to this article compared to the average. However, it's essential to balance this with the need to evaluate other articles as well.
Novelty/Surprise: 6 The article provides a unique perspective on how a moderation bot is used to promote posts in the Popular STEM community, but the concept of a moderation bot is not entirely new.
Organization: 8 The article is well-structured and easy to follow, with a clear explanation of how the mod-bot system works and its benefits.
Grammar/Spelling: 10 There are no noticeable mistakes in grammar or spelling.
Relevance: 8 The topic is relevant to a niche audience interested in Steem and blockchain technology, but may not be of interest to a wider audience.
Overall Score: 7.4 The article provides a clear and concise explanation of a unique concept, but its relevance is limited to a specific niche audience.
Percentage Allocation: 60 Considering the article's unique perspective and clear explanation, I would allocate a higher percentage than the average 50%. However, its limited relevance to a wider audience prevents me from allocating a higher percentage.
1. Novelty/surprise: The article provides a unique perspective on community moderation and visibility through the use of a moderation bot. I would rate this a 7/10.
2. Organization: The article is well-structured and easy to follow. The information is presented in a clear and logical manner. I would rate this a 8/10.
3. Grammar/spelling: The article appears to be free of grammatical and spelling errors. I would rate this a 9/10.
4. Relevance: The topic is relevant to the STEM community and could be of interest to anyone involved in online communities or moderation. I would rate this a 7/10.
Overall, I would rate this article a 31/40 (out of 10).
For your voting, since you plan to vote on 20 articles with an average percentage of 50%, and considering this article's overall score, I would suggest allocating around 55-60% to this article. This is slightly above the average due to its good organization and error-free text. However, the novelty and relevance scores are somewhat moderate, which is reflected in the suggested percentage.
Dimension 1: Novelty/surprise Score: 2/10 The article appears to be a collection of user-generated content from a web page, with various posts on different topics such as weather, personal experiences, and news. While some of the content may be interesting or surprising to individual readers, the overall collection lacks a unique perspective or information that would make it stand out.
Dimension 2: Organization Score: 4/10 The article is structured as a collection of posts, but it lacks a clear organization or flow. The posts are not grouped by topic or theme, and there is no clear introduction or conclusion to tie the content together. This makes it difficult for the reader to follow or engage with the content.
Dimension 3: Grammar/spelling Score: 9/10 Upon reviewing the article, I did not notice any significant grammar or spelling mistakes. The text appears to be well-written and free of errors.
Dimension 4: Relevance Score: 6/10 The article appears to be a collection of user-generated content, and the topics covered are diverse and may not be relevant to a wide audience. However, some of the posts may be of interest to readers who are interested in personal experiences, news, or current events.
Overall score Score: 4.5/10 Considering the dimensions above, the overall score reflects the article's lack of novelty, poor organization, and limited relevance to a wide audience.
Percentage allocation
I would recommend allocating 20-25% to this article. This is a relatively low percentage, reflecting the article's limited relevance and lack of unique information or perspective.
I hadn't planned to add the anonymous trending article, or the Mixtral output, but Llama's results were so consistent on my posts that I was afraid it was stuck. And then I was still concerned when Mixtral also came in close to Llama. We finally see a substantial change when I switch to some totally independent content by a different author who is represented on the trending page. So, I guess we know what the AIs think of me as a writer. 😉
This was an interesting experiment, and I think the technique could provide useful information to a curator. I've played with curation prompting before, but this is the first time that I logged and compared the answers. Overall, it seems promising, but there were a couple quirks.
Clearly, this is not ready for primetime as a standalone curation tool, but I think it can be a useful virtual partner for "pair curation".
An interesting note is that it even works in languages that I can't read.
Question: How do you think the prompt could be improved to provide evaluations that are more useful?
Addendum 1: After proof-reading and before posting, I had the AI evaluate this post, too. You'll never guess what voting percentage it suggested...😉
Addendum 2: Not sure how I thought this was going to be a "-lite" article when I decided to post from this account.😂
Visit the /promoted page and #burnsteem25 to support the inflation-fighters who are helping to enable decentralized regulation of Steem token supply growth.