Web to Gemini proxy¶
Pāpiliō levior est ave (The butterfly is lighter than the bird)
levior (a latin word meaning lighter) is a web (HTTP/HTTPs) to Gemini proxy. It converts web pages (as well as Atom/RSS feeds) on-the-fly to the gemtext format, allowing you to browse the web with any Gemini browser.
-
Builds an (RDF) graph from the visited pages using linked data.
-
Supports Javascript rendering and can therefore be used to browse dynamic websites.
-
Supports serving other types of content, like ZIM files (the archive format used by Wikipedia), making it possible to browse complete wikis through Gemini (see the config file).
Supporting this project¶
If you want to support this project, you can make a donation here (or here).
You can get in touch via misfin at the following misfin address: cipres AT hashnix.club.
Installation¶
AppImage¶
You can get the latest AppImage (for the x86_64 platform) here. This would install levior in ~/.local/bin:
curl -L -o ~/.local/bin/levior https://gitlab.com/cipres/levior/-/releases/continuous-master/downloads/levior-latest-x86_64.AppImage
chmod +x ~/.local/bin/levior
Manual install¶
Clone the repo and create a virtualenv:
git clone https://gitlab.com/cipres/levior && cd levior
python3 -m venv venv; source venv/bin/activate
Upgrade pip and install:
For zim or uvloop support, install the extra requirements:
For Javascript rendering, install the js extra:
Manual install (arm, aarch64, Raspberry Pi, and others)¶
One of the dependencies, aiogemini, requires the cryptography package, which since version 35.0 requires Rust to build, which might not be available on your system. If you don't have rust, you can install an older version of the cryptography package that does not require rust, by running:
Usage¶
levior can be configured from the command-line or via a YAML config file If a config file is provided, settings from both sources are merged to create a unique config, with the config file settings taking precedence. See the example config file. URL rules can only be configured with a config file.
levior uses the OmegaConf library to parse the YAML config files, therefore all the specific syntax elements supported by OmegaConf can be used in your configuration files. levior provides several resolvers that you can use inside your config file.
Once levior is running, open your gemini browser and go to gemini://localhost.
Proxies (HTTP, Socks4 and Socks5) are supported.
Generating a new configuration file¶
Daemonization¶
Use --daemon or -d to run levior as a daemon, or set the daemonize setting in the config file:
Custom SSL certificate¶
By default, levior will use a built-in SSL certificate and key that is appropriate when the service is listening on localhost.
If you are configuring levior to listen on a non-local interface, you will first need to generate your own SSL keypair. Then, in your config file, set the cert and key attributes to point to the file paths of your SSL certificate and key.
You can also use the --cert and -key command-line parameters.
Logging¶
Access log¶
Requests are logged as gemtext links. Use --log-file if you want the access log to be written to a file.
- If you are not running levior as a daemon, and you don't specify an access log file path, requests are logged to the console
- If you are running levior as a daemon, requests are logged to the specified log file (or the default: levior-log.gmi)
Access log server endpoint¶
Set access_log_endpoint to true in your config file to enable the access log endpoint /access_log on the server. This endpoint shows the proxy's access log in the gemtext format.
Restricting access by IP address or network¶
You can restrict access to the proxy by declaring a list of allowed IP addresses or networks in your config file.
RDF¶
levior uses an RDF graph to store various attributes about the pages that are accessed via the proxy:
- The page's title
- Every link contained in the page is defined as being referenced by the source page
- Gemtext headers in the page are the table of contents
If you want to disable the automatic graphing of pages, disable graph_visited_pages in the config:
URL mapping¶
Define urlmap in your config file to map specific paths (on levior's gemini server) to certain URLs.
urlmap:
# When /searx is requested without a gemini query, it will send
# an input response. When the input is sent back, it will redirect the
# user to "https://searx.be/search?q={input}"
/searx:
input_for: https://searx.be/search?q=
route_name: Search with SearX
/liteduck:
input_for: https://lite.duckduckgo.com/lite/?q=
route_name: DuckDuckGo Lite search
# Mapping with variables in the path
# /z/test => https://searx.be/search?q=test
/z/{query}:
url: https://searx.be/search?q={query}
If you set route_name, the route will appear on levior's homepage.
URL rules¶
You can define your own rules in order to apply some processing on the gemtext that will be sent to the browser, or return a specific gemini response.
A rule must define which URL(s) to match with the url attribute, which can be a regular expression or a list of regular expressions. If the response attribute is defined, the status attribute must be set as an aiogemini Status code. Here are some basic examples of custom rules:
Set js_render in the rule to enable JS rendering.
Caching¶
The raw content of the web resources fetched by the proxy can be cached. The result of the geminification of the pages (the gemtext document) is never cached.
Set the cache attribute in your rule to cache the data. The ttl (time-to-live) attribute determines the expiration lifetime (in seconds) for the resource's content in the cache. The data will be served from the cache until the ttl expires (subsequent requests will trigger a refetch).
Caching the access log¶
The access log can be persisted in the cache via the persist_access_log setting (or with --persist-access-log). This is disabled by default.
Caching links on pages¶
Specific links to cache the page for a few days (or forever) can be shown at the top of the page, with the page_cachelinks setting. This makes it easy to cache a page that you've just browsed without having to define custom rules.
Includes¶
It is also possible to load predefined rules by using the include keyword in your config file. If you prefix the path with levior:, it will be loaded from the builtin rules library (please open a PR to submit new rules), otherwise it is assumed to be a local file.
When you use the levior: prefix, you can pass a glob-style pattern, allowing you to source multiple files in a single include.
Rules can receive parameters, allowing the creation of more generic rules that can be applied to any URL.
To pass params to the rule from the config file, set the rule path by setting the src attribute, and set the params via the with attribute.
include:
- src: words_upper.yaml
with:
URL:
- https://example.org/.*.html
- https://domain.io
uwords:
- coffee
- milk
The puretext rule keeps only the text content:
Proxies¶
Default proxy¶
You can set the default proxy URL with the proxy attribute, whose value must be a proxy URL or a list of proxy URLs, to establish a proxy chain. HTTP, Socks4 and Socks5 proxies are supported.
Defining a single proxy:
To use a proxy chain (Proxy chaining is a technique that allows you to use multiple proxies to access the web anonymously and bypass geo-restrictions), just declare your proxies as a list (the order matters):
Random proxies¶
You can use the OmegaConf resolver called random to choose a random proxy from a predefined list. The resolver will be called on every request, so this means that a proxy URL will be randomly chosen from the list for every request:
my_proxies:
- http://10.0.1.2:8090
- http://10.0.4.2:8092
- http://10.0.8.4:8094
proxy: ${random:${my_proxies}}
Setting a proxy for a rule¶
Setting a proxy when including another config file¶
When including one or more config files, you can set the proxy that will be used for the included rules:
HTTP headers¶
In a rule or at the top of the config file, you can set specific HTTP headers that will be used when making HTTP requests:
Feeds aggregator¶
It is possible to aggregate multiple Atom/RSS web feeds into a single tinylog, by setting the rule type to feeds_aggregator and defining the list of feeds. Example:
rules:
- url: '^gemini://localhost/francetv'
type: 'feeds_aggregator'
# "feeds" is a dictionary, the key must be the feed's URL, the
# dict value is for the feed's options
feeds:
https://www.francetvinfo.fr/titres.rss: {}
https://www.francetvinfo.fr/monde.rss: {}
https://www.francetvinfo.fr/culture.rss:
enabled: false
When you are sourcing a config file that includes aggregation rules, you can enable or disable certain feeds using the parameters:
Gemtext filters¶
It's possible to run filters on the gemtext content that will be sent to the browser. In your config file, set the gemtext_filters property for the rule. For example, this will remove any email address link by running the strip_emailaddrs function found in the levior.filters.links python module (if you don't specify a function name, it will call the gemtext_filter function/coroutine in that module by default):
urules:
- url:
- "https://searx.be/search"
- "https://lite.duckduckgo.com/lite/search"
gemtext_filters:
- levior.filters.links:strip_emailaddrs
- filter: levior.filters:get_out
re:
- 'google'
- 'stop'
You can also pass params to your filter. This rule removes all (English) wikipedia URLs and PNG image URLs in the final gemtext:
urules:
- url: ".*"
gemtext_filters:
- filter: levior.filters.links:url_remove
urls:
- ^https://en.wikipedia.org
- \.png$
Your filter (which can be a function or a coroutine) can return different value types:
- boolean: if your filter returns True, that gemtext line will be removed (filtered out).
- Line (trimgmi class): If you return a Line object, it will be used to replace the original gemtext line.
- list: If you return a list of Line objects, they will be inserted in place
- str: replace the original gemtext line with this raw string value
- int: If your filter returns a negative integer, everything after that in the document (including that line) will be removed.
Any other return value type will be ignored.
Checkout the filters package to see all the available builtin filters.
OmegaConf resolvers¶
levior provides a few OC resolvers (which are like functions called when the YAML element is accessed).
random¶
Returns a random item from a list.
my_proxies:
- http://10.0.1.2:8090
- http://10.0.4.2:8092
- http://10.0.8.4:8094
proxy: ${random:${my_proxies}}
ua_roulette¶
User Agent roulette.
Returns a random browser user agent string. Takes no argument.
custom_ua_roulette¶
Custom User Agent roulette.
Returns a random browser user agent string for specific operating systems, browsers and browser engines. The parameters are, in this order:
- Operating system list. e.g: [linux, freebsd]
- Software list (optional). e.g: [firefox, chromium]
- Software engine list (optional). e.g: [webkit,blink]
- Hardware type list (optional). e.g: [mobile]
See the random_user_agent documentation for a list of params.
Note: passing invalid parameters will raise a ValueError exception.
Javascript rendering¶
Experimental feature.
levior (through the use of requests-html which uses the pyppeteer headless automation library) can render webpages that contain Javascript code.
Pass --js on the command-line to enable Javascript rendering. Use js-force to always run JS rendering even if no JS scripts were detected on the page.
Note: when you run levior with JS rendering for the first time, pyppeteer will download a copy of the browser binary that it requires to run (about ~300 Mb of free disk space is required).
Service modes¶
-
server: serves web content as gemtext, via gemini URLs. When you visit levior's gemini URL (gemini://localhost by default) you'll be asked for a web domain to browse via a gemini input request. You can also simply go to gemini://localhost/{domain} in your Gemini browser, for example gemini://localhost/sr.ht to browse https://sr.ht. The URLs in the HTML pages are rewritten to be routed through the levior server. This mode is compatible with any Gemini browser.
-
proxy: in this mode, levior acts as a proxy for http and https URLs and serves pages without rewriting URLs. To use this mode, you need a Gemini browser that supports http proxies. Here's a list of browsers supporting proxies: Gemalaya (bundles and uses levior in proxy mode by default), Lagrange, Amfora, diohsc and Telescope.
The allowed modes can be set with the --mode (or -m) command-line argument or with the mode setting in the config file. Use --mode=proxy to run only as a transparent http proxy, or --mode=server to only serve requests made with gemini URLs.
Use --mode=proxy,server to handle both request types (this is the default).
Configuring your Gemini browser to use levior as a proxy¶
Lagrange¶
In the File menu, select Preferences, and go to the Network section. Set the HTTP proxy text field to 127.0.0.1:1965. If you're not running levior on localhost, set it to levior's listening IP and port.
Telescope¶
As explained in the docs, edit ~/.config/telescope/config and add the following:
Links¶
The --links option controls the Gemini links generation mode (this is an md2gemini option):
- paragraph (this is the default): This will result in footnotes being added to the document, and the links for each footnote being added at the end of each paragraph
- copy: Like paragraph, but without footnotes
- at-end: The links are added at the very end of the document
- off: Remove all links
Open your Gemini browser and go to gemini://localhost or //localhost.
Mounting ZIM images¶
You can also mount ZIM files to be served via the gemini protocol. Once you've configured a ZIM mountpoint, go to gemini://localhost/{mountpoint} (for example: gemini://localhost/wiki_en). A great source of ZIM archives is the kiwix library.
It's possible to run searches on the ZIM archive's contents. Go to gemini://localhost/{mountpoint}/search (for example: gemini://localhost/wiki_en/search), where you'll be prompted for a search query (by default there's a limit of 4096 results, this can be changed via the search_results_max option). The search_path option sets the URL path of the search API:
mount:
/wiki_en:
type: zim
path: ./wikipedia_en_all_mini_2022-03.zim
search_path: /
search_results_max: 8192
See the example config file here.
Server endpoints¶
/¶
The homepage lists the links for the main endpoints, the mountpoints and the links to access the aggregated RSS/Atom feeds.
/goto¶
When accessing /goto, or /go, you'll be prompted for a domain name or a full URL to browse.
/{domain}¶
When accessing /{domain}, levior will proxy https://{domain} to the Gemini browser. Examples:
/access_log¶
Shows the proxy's access log.
/cache¶
Lists the objects stored in the cache.
/graph¶
RDF graph index
/graph/search¶
RDF graph search endpoint
/search¶
When accessing /search, you'll be prompted for a search query. Your search will be performed via the searx search engine.