Compare commits

...

10 commits

Author SHA1 Message Date
eacfe97d98
fix some things for sharkey support 2024-03-21 00:51:37 +01:00
smitten
d08f1f94ec Update docs 2023-07-20 00:24:18 -04:00
smitten
c03c025773 Support for CW partial regexp match by word. Update documentation with 'How to Use' 2023-07-20 00:10:54 -04:00
autumn
59e9efe118 remove any generated @'s in replies to prevent accidentally mentioning somebody else on the same instance 2023-03-05 21:33:59 +00:00
io
03305c4a55 use external pleroma.py 2023-01-11 06:23:42 +00:00
lucdev
82320560a1 Rate limiter: fix datetime parser (fixes #5) 2022-12-29 01:32:22 +00:00
Joel Beckmeyer
6523a28e11
fix some pleroma errors with async and 500 errors (#4)
* fix some pleroma errors with async and 500 errors

* add better recovery/handling of HTTP 500

* remove unnecessary else
2022-12-29 01:22:24 +00:00
io
766b60c09c accept pleroma-style empty string CWs (fix #3) 2021-09-27 11:17:56 +00:00
io
94949289f0 normalize Pleroman posts without cws (fix #3) 2021-09-27 11:17:56 +00:00
io
4492f6685c remove repeated call to extract_post_content (#2)
fortunately, extract_post_content returns the same thing when called repeatedly,
so this wasn't a big deal anyway
2021-09-27 11:17:56 +00:00
8 changed files with 72 additions and 213 deletions

5
.gitignore vendored
View file

@ -19,3 +19,8 @@ __pycache__/
!*.defaults.json !*.defaults.json
venv/ venv/
.venv/ .venv/
bin
lib
lib64
pyenv.cfg

View file

@ -1,9 +1,24 @@
# pleroma-ebooks # pleroma-ebooks
this is a very slightly modified version of <https://codeberg.org/autumn/pleroma-ebooks> which is modified from <https://github.com/ioistired/pleroma-ebooks>
_autumn's changes_: removes any generated @'s in replies to prevent accidentally mentioning somebody else on the same instance
_smitten's changes_: adjusts CW filtering to be word-based regexp instead of exact match
It's like [@AgathaSorceress's mstdn-ebooks] but it supports Pleroma better. It's like [@AgathaSorceress's mstdn-ebooks] but it supports Pleroma better.
[@AgathaSorceress's mstdn-ebooks]: https://github.com/AgathaSorceress/mstdn-ebooks [@AgathaSorceress's mstdn-ebooks]: https://github.com/AgathaSorceress/mstdn-ebooks
## How to Use
1. Create your bot account on the server.
2. Follow the user(s) you want to base the model on.
3. Get an access token for your bot. See [mastodon-bot](https://tinysubversions.com/notes/mastodon-bot/) for details.
4. Copy `config.defaults.json` to `config.json` and set as `access_token`. Make any other config tweaks you'd like.
5. Run `fetch_posts.py` to collect the posts from the followed user(s).
6. Run `gen.py` to generate the sentence and write it to the server.
7. (optional) set up cron or other periodic auto-run. run `reply.py` in the background.
## Secure Fetch ## Secure Fetch
Secure fetch (aka authorised fetches, authenticated fetches, secure mode...) is *not* supported by pleroma-ebooks, and will fail to download any posts from users on instances with secure fetch enabled. For more information, see [this wiki page](https://github.com/Lynnesbian/mstdn-ebooks/wiki/Secure-fetch). Secure fetch (aka authorised fetches, authenticated fetches, secure mode...) is *not* supported by pleroma-ebooks, and will fail to download any posts from users on instances with secure fetch enabled. For more information, see [this wiki page](https://github.com/Lynnesbian/mstdn-ebooks/wiki/Secure-fetch).
@ -28,11 +43,11 @@ Configuring pleroma-ebooks is accomplished by editing `config.json`. If you want
| site | https://botsin.space | The instance your bot will log in to and post from. This must start with `https://` or `http://` (preferably the latter) | | site | https://botsin.space | The instance your bot will log in to and post from. This must start with `https://` or `http://` (preferably the latter) |
| cw | null | The content warning (aka subject) pleroma-ebooks will apply to non-error posts. | | cw | null | The content warning (aka subject) pleroma-ebooks will apply to non-error posts. |
| learn_from_cw | false | If true, pleroma-ebooks will learn from CW'd posts. | | learn_from_cw | false | If true, pleroma-ebooks will learn from CW'd posts. |
| ignored_cws | [] | If `learn_from_cw` is true, do not learn from posts with these CWs. | ignored_cws | [] | If `learn_from_cw` is true, do not learn from posts with these CW words. matches case-insensitive and by word.
| mention_handling | 1 | 0: Never use mentions. 1: Only generate fake mentions in the middle of posts, never at the start. 2: Use mentions as normal (old behaviour). | | mention_handling | 1 | 0: Never use mentions. 1: Only generate fake mentions in the middle of posts, never at the start. 2: Use mentions as normal (old behaviour). |
| max_thread_length | 15 | The maximum number of bot posts in a thread before it stops replying. A thread can be 10 or 10000 posts long, but the bot will stop after it has posted `max_thread_length` times. | | max_thread_length | 15 | The maximum number of bot posts in a thread before it stops replying. A thread can be 10 or 10000 posts long, but the bot will stop after it has posted `max_thread_length` times. |
| strip_paired_punctuation | false | If true, pleroma-ebooks will remove punctuation that commonly appears in pairs, like " and (). This avoids the issue of posts that open a bracket (or quote) without closing it. | | strip_paired_punctuation | false | If true, pleroma-ebooks will remove punctuation that commonly appears in pairs, like " and (). This avoids the issue of posts that open a bracket (or quote) without closing it. |
| limit_length | false | If true, the sentence length will be random between `length_lower_limit` and `length_upper_limit` | | limit_length | false | If true, the sentence word length will be random between `length_lower_limit` and `length_upper_limit` |
| length_lower_limit | 5 | The lower bound in the random number range above. Only matters if `limit_length` is true. | | length_lower_limit | 5 | The lower bound in the random number range above. Only matters if `limit_length` is true. |
| length_upper_limit | 50 | The upper bound in the random number range above. Can be the same as `length_lower_limit` to disable randomness. Only matters if `limit_length` is true. | | length_upper_limit | 50 | The upper bound in the random number range above. Can be the same as `length_lower_limit` to disable randomness. Only matters if `limit_length` is true. |
| overlap_ratio_enabled | false | If true, checks the output's similarity to the original posts. | | overlap_ratio_enabled | false | If true, checks the output's similarity to the original posts. |
@ -48,4 +63,4 @@ Please don't feel obligated to donate at all.
This is released under the AGPLv3 (only) license, and based on Lynnesbian's fork which is under the MPL 2.0 license. See LICENSE-AGPL.md and LICENSE-MPL for details. This is released under the AGPLv3 (only) license, and based on Lynnesbian's fork which is under the MPL 2.0 license. See LICENSE-AGPL.md and LICENSE-MPL for details.
**This means you must publish the source code of any ebooks bot you make with this.** A link back to this repository on your bot's profile page or profile metadata will suffice. If you make changes to the code you need to link to your fork/repo instead **This means you must publish the source code of any ebooks bot you make with this.** A link back to this repository on your bot's profile page or profile metadata will suffice. If you make changes to the code you need to link to your fork/repo instead.

View file

@ -10,19 +10,13 @@ import operator
import aiosqlite import aiosqlite
import contextlib import contextlib
from yarl import URL from yarl import URL
from pleroma import Pleroma from pleroma import Pleroma, HandleRateLimits
from bs4 import BeautifulSoup from bs4 import BeautifulSoup
from functools import partial from functools import partial
from typing import Iterable, NewType from typing import Iterable, NewType
from utils import shield, HandleRateLimits, suppress from utils import shield, suppress, http_session_factory
from third_party.utils import extract_post_content from third_party.utils import extract_post_content
USER_AGENT = (
'pleroma-ebooks; '
f'{aiohttp.__version__}; '
f'{platform.python_implementation()}/{platform.python_version()}'
)
UTC = pendulum.timezone('UTC') UTC = pendulum.timezone('UTC')
JSON_CONTENT_TYPE = 'application/json' JSON_CONTENT_TYPE = 'application/json'
ACTIVITYPUB_CONTENT_TYPE = 'application/activity+json' ACTIVITYPUB_CONTENT_TYPE = 'application/activity+json'
@ -40,11 +34,8 @@ class PostFetcher:
Pleroma(api_base_url=self.config['site'], access_token=self.config['access_token']), Pleroma(api_base_url=self.config['site'], access_token=self.config['access_token']),
) )
self._http = await stack.enter_async_context( self._http = await stack.enter_async_context(
aiohttp.ClientSession( http_session_factory(
headers={ headers={'Accept': ', '.join([JSON_CONTENT_TYPE, ACTIVITYPUB_CONTENT_TYPE])},
'User-Agent': USER_AGENT,
'Accept': ', '.join([JSON_CONTENT_TYPE, ACTIVITYPUB_CONTENT_TYPE]),
},
trust_env=True, trust_env=True,
raise_for_status=True, raise_for_status=True,
), ),
@ -137,7 +128,11 @@ class PostFetcher:
obj = activity['object'] obj = activity['object']
content = extract_post_content(obj['content']) try:
obj['summary']
except KeyError:
obj['summary'] = None
await self._db.execute( await self._db.execute(
""" """
INSERT INTO posts (post_id, summary, content, published_at) INSERT INTO posts (post_id, summary, content, published_at)
@ -145,7 +140,9 @@ class PostFetcher:
""", """,
( (
obj['id'], obj['id'],
obj['summary'], # Pleroma returns an empty string here for posts without a CW,
# which is semantically incorrect IMO
obj['summary'] or None,
extract_post_content(obj['content']), extract_post_content(obj['content']),
pendulum.parse(obj['published']).astimezone(pendulum.timezone('UTC')).timestamp(), pendulum.parse(obj['published']).astimezone(pendulum.timezone('UTC')).timestamp(),
), ),

View file

@ -2,6 +2,7 @@
import sqlite3 import sqlite3
import markovify import markovify
import regex
def make_sentence(cfg): def make_sentence(cfg):
class nlt_fixed(markovify.NewlineText): # modified version of NewlineText that never rejects sentences class nlt_fixed(markovify.NewlineText): # modified version of NewlineText that never rejects sentences
@ -10,19 +11,21 @@ def make_sentence(cfg):
db = sqlite3.connect(cfg["db_path"]) db = sqlite3.connect(cfg["db_path"])
db.text_factory = str db.text_factory = str
p = regex.compile(r"\L<words>", words=cfg["ignored_cws"],flags=regex.IGNORECASE)
def cw_regexp(x):
return 1 if p.search(x) else 0
db.create_function('cwregexp', 1, cw_regexp)
c = db.cursor() c = db.cursor()
if cfg['learn_from_cw']: if cfg['learn_from_cw']:
ignored_cws_query_params = "(" + ",".join("?" * len(cfg["ignored_cws"])) + ")"
toots = c.execute( toots = c.execute(
f""" f"""
SELECT content SELECT content
FROM posts FROM posts
WHERE WHERE
summary IS NULL summary IS NULL
OR summary NOT IN {ignored_cws_query_params} OR NOT CWREGEXP(summary)
ORDER BY RANDOM() LIMIT 10000 ORDER BY RANDOM() LIMIT 10000
""", """,
cfg["ignored_cws"],
).fetchall() ).fetchall()
else: else:
toots = c.execute( toots = c.execute(

View file

@ -1,156 +0,0 @@
# SPDX-License-Identifier: AGPL-3.0-only
import sys
import yarl
import json
import hashlib
import aiohttp
from http import HTTPStatus
def http_session_factory(headers={}):
py_version = '.'.join(map(str, sys.version_info))
user_agent = (
'pleroma-ebooks (https://github.com/ioistired/pleroma-ebooks); '
'aiohttp/{aiohttp.__version__}; '
'python/{py_version}'
)
return aiohttp.ClientSession(
headers={'User-Agent': user_agent, **headers},
)
class BadRequest(Exception):
pass
class Pleroma:
def __init__(self, *, api_base_url, access_token):
self.api_base_url = api_base_url.rstrip('/')
self.access_token = access_token
self._session = http_session_factory({'Authorization': 'Bearer ' + access_token})
self._logged_in_id = None
async def __aenter__(self):
self._session = await self._session.__aenter__()
return self
async def __aexit__(self, *excinfo):
return await self._session.__aexit__(*excinfo)
async def request(self, method, path, **kwargs):
# blocklist of some horrible instances
if hashlib.sha256(
yarl.URL(self.api_base_url).host.encode()
+ bytes.fromhex('d590e3c48d599db6776e89dfc8ebaf53c8cd84866a76305049d8d8c5d4126ce1')
).hexdigest() in {
'56704d4d95b882e81c8e7765e9079be0afc4e353925ba9add8fd65976f52db83',
'1932431fa41a0baaccce7815115b01e40e0237035bb155713712075b887f5a19',
'a42191105a9f3514a1d5131969c07a95e06d0fdf0058f18e478823bf299881c9',
}:
raise RuntimeError('stop being a chud')
async with self._session.request(method, self.api_base_url + path, **kwargs) as resp:
if resp.status == HTTPStatus.BAD_REQUEST:
raise BadRequest((await resp.json())['error'])
#resp.raise_for_status()
return await resp.json()
async def verify_credentials(self):
return await self.request('GET', '/api/v1/accounts/verify_credentials')
me = verify_credentials
async def _get_logged_in_id(self):
if self._logged_in_id is None:
self._logged_in_id = (await self.me())['id']
return self._logged_in_id
async def following(self, account_id=None):
account_id = account_id or await self._get_logged_in_id()
return await self.request('GET', f'/api/v1/accounts/{account_id}/following')
@staticmethod
def _unpack_id(obj):
if isinstance(obj, dict) and 'id' in obj:
return obj['id']
return obj
async def status_context(self, id):
id = self._unpack_id(id)
return await self.request('GET', f'/api/v1/statuses/{id}/context')
async def post(self, content, *, in_reply_to_id=None, cw=None, visibility=None):
if visibility not in {None, 'private', 'public', 'unlisted', 'direct'}:
raise ValueError('invalid visibility', visibility)
data = dict(status=content)
if in_reply_to_id := self._unpack_id(in_reply_to_id):
data['in_reply_to_id'] = in_reply_to_id
if visibility is not None:
data['visibility'] = visibility
if cw is not None:
data['spoiler_text'] = cw
return await self.request('POST', '/api/v1/statuses', data=data)
async def reply(self, to_status, content, *, cw=None):
user_id = await self._get_logged_in_id()
mentioned_accounts = {}
mentioned_accounts[to_status['account']['id']] = to_status['account']['acct']
for account in to_status['mentions']:
if account['id'] != user_id and account['id'] not in mentioned_accounts:
mentioned_accounts[account['id']] = account['acct']
content = ''.join('@' + x + ' ' for x in mentioned_accounts.values()) + content
visibility = 'unlisted' if to_status['visibility'] == 'public' else to_status['visibility']
if cw is None and 'spoiler_text' in to_status and to_status['spoiler_text']:
cw = 're: ' + to_status['spoiler_text']
return await self.post(content, in_reply_to_id=to_status['id'], cw=cw, visibility=visibility)
async def favorite(self, id):
id = self._unpack_id(id)
return await self.request('POST', f'/api/v1/statuses/{id}/favourite')
async def unfavorite(self, id):
id = self._unpack_id(id)
return await self.request('POST', f'/api/v1/statuses/{id}/unfavourite')
async def react(self, id, reaction):
id = self._unpack_id(id)
return await self.request('PUT', f'/api/v1/pleroma/statuses/{id}/reactions/{reaction}')
async def remove_reaction(self, id, reaction):
id = self._unpack_id(id)
return await self.request('DELETE', f'/api/v1/pleroma/statuses/{id}/reactions/{reaction}')
async def pin(self, id):
id = self._unpack_id(id)
return await self.request('POST', f'/api/v1/statuses/{id}/pin')
async def unpin(self, id):
id = self._unpack_id(id)
return await self.request('POST', f'/api/v1/statuses/{id}/unpin')
async def stream(self, stream_name, *, target_event_type=None):
async with self._session.ws_connect(
self.api_base_url + f'/api/v1/streaming?stream={stream_name}&access_token={self.access_token}'
) as ws:
async for msg in ws:
if msg.type == aiohttp.WSMsgType.TEXT:
event = msg.json()
# the only event type that doesn't define `payload` is `filters_changed`
if event['event'] == 'filters_changed':
yield event
elif target_event_type is None or event['event'] == target_event_type:
# don't ask me why the payload is also JSON encoded smh
yield json.loads(event['payload'])
async def stream_notifications(self):
async for notif in self.stream('user:notification', target_event_type='notification'):
yield notif
async def stream_mentions(self):
async for notif in self.stream_notifications():
if notif['type'] == 'mention':
yield notif

View file

@ -22,10 +22,22 @@ class ReplyBot:
async for notification in self.pleroma.stream_mentions(): async for notification in self.pleroma.stream_mentions():
await self.process_notification(notification) await self.process_notification(notification)
async def process_notification(self, notification): async def process_notification(self, notification, retry_count=0):
acct = "@" + notification['account']['acct'] # get the account's @ acct = "@" + notification['account']['acct'] # get the account's @
post_id = notification['status']['id'] post_id = notification['status']['id']
context = await self.pleroma.status_context(post_id)
# catch HTTP 500 and backoff on requests
retry_count = retry_count + 1
try:
context = await self.pleroma.status_context(post_id)
except pleroma.BadResponse as exc:
if retry_count < 3:
await anyio.sleep(2**retry_count)
await self.process_notification(notification, retry_count)
else:
# failed too many times in a row, logging
print(f"Received HTTP 500 {retry_count} times in a row, aborting reply attempt.")
return
# check if we've already been participating in this thread # check if we've already been participating in this thread
if self.check_thread_length(context): if self.check_thread_length(context):
@ -69,12 +81,13 @@ class ReplyBot:
await self.pleroma.react(post_id, '') await self.pleroma.react(post_id, '')
async def reply(self, notification): async def reply(self, notification):
toot = utils.make_toot(self.cfg) # generate a toot toot = await utils.make_post(self.cfg) # generate a toot
toot = re.sub(r"@\S+\s", r"", toot) # remove any generated @'s
await self.pleroma.reply(notification['status'], toot, cw=self.cfg['cw']) await self.pleroma.reply(notification['status'], toot, cw=self.cfg['cw'])
@staticmethod @staticmethod
def extract_toot(toot): def extract_toot(toot):
text = utils.extract_toot(toot) text = utils.extract_post_content(toot)
text = re.sub(r"^@\S+\s", r"", text) # remove the initial mention text = re.sub(r"^@\S+\s", r"", text) # remove the initial mention
text = text.lower() # treat text as lowercase for easier keyword matching (if this bot uses it) text = text.lower() # treat text as lowercase for easier keyword matching (if this bot uses it)
return text return text

View file

@ -1,4 +1,5 @@
beautifulsoup4 ~= 4.9 beautifulsoup4 ~= 4.9
pleroma.py ~= 0.0.1
aiohttp ~= 3.0 aiohttp ~= 3.0
json5 ~= 0.9.5 json5 ~= 0.9.5
anyio ~= 3.0 anyio ~= 3.0

View file

@ -1,9 +1,10 @@
# SPDX-License-Identifier: AGPL-3.0-only # SPDX-License-Identifier: AGPL-3.0-only
import sys
import anyio import anyio
import aiohttp
import contextlib import contextlib
from functools import wraps from functools import wraps
from datetime import datetime, timezone
def as_corofunc(f): def as_corofunc(f):
@wraps(f) @wraps(f)
@ -35,34 +36,14 @@ def removeprefix(s, prefix):
# compatibility for pre-3.9 # compatibility for pre-3.9
return s[len(prefix):] if s.startswith(prefix) else s return s[len(prefix):] if s.startswith(prefix) else s
async def sleep_until(dt): def http_session_factory(headers={}, **kwargs):
await anyio.sleep((dt - datetime.now(timezone.utc)).total_seconds()) py_version = '.'.join(map(str, sys.version_info))
user_agent = (
class HandleRateLimits: 'pleroma-ebooks (https://github.com/ioistired/pleroma-ebooks); '
def __init__(self, http): f'aiohttp/{aiohttp.__version__}; '
self.http = http f'python/{py_version}'
)
def request(self, *args, **kwargs): return aiohttp.ClientSession(
return _RateLimitContextManager(self.http, args, kwargs) headers={'User-Agent': user_agent, **headers},
**kwargs,
class _RateLimitContextManager(contextlib.AbstractAsyncContextManager): )
def __init__(self, http, args, kwargs):
self.http = http
self.args = args
self.kwargs = kwargs
async def __aenter__(self):
self._request_cm = self.http.request(*self.args, **self.kwargs)
return await self._do_enter()
async def _do_enter(self):
resp = await self._request_cm.__aenter__()
if resp.headers.get('X-RateLimit-Remaining') not in {'0', '1'}:
return resp
await sleep_until(datetime.fromisoformat(resp.headers['X-RateLimit-Reset']))
await self._request_cm.__aexit__(*(None,)*3)
return await self.__aenter__()
async def __aexit__(self, *excinfo):
return await self._request_cm.__aexit__(*excinfo)