Erik Johnston
9ccb4226ba
Delete expired url cache data
2017-09-28 12:18:06 +01:00
Erik Johnston
7fe8ed1787
Store URL cache preview downloads seperately
...
This makes it easier to clear old media out at a later date
2017-06-23 11:14:11 +01:00
Erik Johnston
b8b936a6ea
Add API to quarantine media
2017-06-19 17:39:21 +01:00
Erik Johnston
48d2949416
Throw exception when not retrying when downloading media
2017-06-13 10:23:14 +01:00
Matthew Hodgson
836d5c44b6
actually trim oversize og:description meta
2017-05-22 21:14:20 +01:00
Erik Johnston
d12ae7fd1c
Don't log exceptions for NotRetryingDestination
2017-05-15 15:42:18 +01:00
Richard van der Hoff
1d09586599
Address review comments
...
- don't blindly proxy all HTTPRequestExceptions
- log unexpected exceptions at error
- avoid `isinstance`
- improve docs on `from_http_response_exception`
2017-03-14 14:15:37 +00:00
Richard van der Hoff
170ccc9de5
Fix routing loop when fetching remote media
...
When we proxy a media request to a remote server, add a query-param, which will
tell the remote server to 404 if it doesn't recognise the server_name.
This should fix a routing loop where the server keeps forwarding back to
itself.
Also improves the error handling on remote media fetches, so that we don't
always return a rather obscure 502.
2017-03-13 16:30:36 +00:00
Jurek
aea5461488
Fix dynamic thumbnails aspect
2017-02-24 22:43:27 +01:00
Mark Haines
32019c9897
Log which files we saved attachments to in the media_repository
2017-01-10 14:19:50 +00:00
Erik Johnston
f7085ac84f
Name linearizer's for better logs
2017-01-09 17:17:10 +00:00
Marcin Bachry
24c16fc349
Fix crash in url preview when html tag has no text
...
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg
32c8b5507c
preview_url_resource: Ellipsis must be in unicode string
...
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-01 13:12:13 +01:00
Mark Haines
b1c27975d0
Set CORs headers on responses from the media repo
2016-11-02 11:29:25 +00:00
Erik Johnston
d51b8a1674
Add quotes and be explicity about script-src
2016-09-05 17:35:01 +01:00
Erik Johnston
662b031a30
Allow PDF to be rendered from media repo
2016-09-05 17:25:26 +01:00
Erik Johnston
0af9e1a637
Set Content-Security-Policy
on media repo
...
This is to inform browsers that they should sandbox the returned
media. This is particularly cruical for javascript/HTML files.
2016-08-17 16:27:39 +01:00
Erik Johnston
f90b3d83a3
Add None check to _iterate_over_text
2016-08-17 15:17:17 +01:00
Erik Johnston
109a560905
Flake8
2016-08-16 14:57:21 +01:00
Erik Johnston
48b5829aea
Fix up preview URL API. Add tests.
...
This includes:
- Splitting out methods of a class into stand alone functions, to make
them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston
5bcccfde6c
Don't include html comments in description
2016-08-05 14:45:11 +01:00
Erik Johnston
b5525c76d1
Typo
2016-08-04 16:10:08 +01:00
Erik Johnston
e97648c4e2
Test summarization
2016-08-04 16:09:09 +01:00
Erik Johnston
58c9653c6b
Don't infer paragrahs from newlines
2016-08-02 18:50:24 +01:00
Erik Johnston
6b58ade2f0
Comment on why we clone
2016-08-02 18:41:22 +01:00
Erik Johnston
9e66c58ceb
Spelling.
2016-08-02 18:37:31 +01:00
Erik Johnston
f83f5fbce8
Make it actually compile
2016-08-02 18:32:42 +01:00
Erik Johnston
aecaec3e10
Change the way we summarize URLs
...
Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.
Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate.
2016-08-02 18:25:53 +01:00
Erik Johnston
f52cb4cd78
Remove race
2016-06-29 15:24:50 +01:00
Erik Johnston
a70688445d
Implement purge_media_cache admin API
2016-06-29 14:57:59 +01:00
Erik Johnston
314b146b2e
Track approximate last access time for remote media
2016-06-29 11:41:20 +01:00
Mark Haines
13e334506c
Remove the legacy v0 content upload API.
...
The existing content can still be downloaded. The last upload to the
matrix.org server was in January 2015, so it is probably safe to remove
the upload API.
2016-06-21 11:47:39 +01:00
Erik Johnston
09a17f965c
Line lengths
2016-06-15 16:58:12 +01:00
Erik Johnston
1e9026e484
Handle floats as img widths
2016-06-15 16:58:05 +01:00
Erik Johnston
a60169ea09
Handle og props with not content
2016-06-15 16:57:48 +01:00
Erik Johnston
eba4ff1bcb
502 on /thumbnail when can't contact remote server
2016-06-09 11:29:43 +01:00
Mark Haines
eb79110beb
Clean up the blacklist/whitelist handling.
...
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines
8d7ad44331
Report per request metrics for all of the things using request_handler
2016-04-28 10:57:49 +01:00
Erik Johnston
e8884e5e9c
Add self.media_repo to PreviewUrlResource
2016-04-19 14:51:34 +01:00
Erik Johnston
a7001c311b
_make_dirs was moved to MediaRepository
2016-04-19 14:49:31 +01:00
Erik Johnston
9181e2f4c7
Add store to PreviewUrlResource
2016-04-19 14:48:24 +01:00
Erik Johnston
fb76a81ff7
Reorder imports
2016-04-19 14:45:05 +01:00
Erik Johnston
0c93df89b6
Move MediaRepository to media_repository module
2016-04-19 11:31:43 +01:00
Erik Johnston
43f0941e8f
Split out BaseMediaResource into MediaRepository
...
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.
In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson
aaabbd3e9e
explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better
2016-04-15 14:32:25 +01:00
Matthew Hodgson
84f9cac4d0
fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong
2016-04-15 13:20:08 +01:00
Matthew Hodgson
f78b479118
fix urlparse import thinko breaking tiny URLs
2016-04-14 15:23:55 +01:00
Matthew Hodgson
bd77216d06
comment out 2c838f6459
due to risk of https://en.wikipedia.org/wiki/Billion_laughs attacks - thanks @torhve
2016-04-14 14:39:24 +01:00
Erik Johnston
d0633e6dbe
Sanitize the optional dependencies for spider API
2016-04-13 13:38:09 +01:00
Erik Johnston
17515bae14
PEP8
2016-04-11 11:02:50 +01:00