Richard van der Hoff
d5352cbba8
Handle url_previews with no content-type
...
avoid failing with an exception if the remote server doesn't give us a
Content-Type header.
Also, clean up the exception handling a bit.
2018-02-02 00:53:46 +00:00
Erik Johnston
6368e5c0ab
Change _generate_thumbnails to take media_type
2018-01-16 16:17:38 +00:00
Erik Johnston
0a90d9ede4
Move setting of file_id up to caller
2018-01-16 16:03:05 +00:00
Erik Johnston
2442e9876c
Make PreviewUrlResource use MediaStorage
2018-01-09 16:15:07 +00:00
Richard van der Hoff
5a4da5bf78
Merge pull request #2697 from matrix-org/rav/fix_urlcache_index_error
...
Fix error on sqlite 3.7
2017-11-27 12:25:48 +00:00
Richard van der Hoff
8132a6b7ac
Fix OPTIONS on preview_url
...
Fixes #2706
2017-11-23 17:52:31 +00:00
Richard van der Hoff
2908f955d1
Check database in has_completed_background_updates
...
so that the right thing happens on workers.
2017-11-22 18:02:15 +00:00
Richard van der Hoff
7098b65cb8
Fix error on sqlite 3.7
...
Create the url_cache index on local_media_repository as a background update, so
that we can detect whether we are on sqlite or not and create a partial or
complete index accordingly.
To avoid running the cleanup job before we have built the index, add a bailout
which will defer the cleanup if the bg updates are still running.
Fixes https://github.com/matrix-org/synapse/issues/2572 .
2017-11-21 11:14:17 +00:00
Richard van der Hoff
5d15abb120
Bit more logging
2017-11-10 16:58:04 +00:00
Richard van der Hoff
46790f50cf
Cache failures in url_preview handler
...
Reshuffle the caching logic in the url_preview handler so that failures are
cached (and to generally simplify things and fix the logcontext leaks).
2017-11-10 16:50:50 +00:00
Maxime Vaillancourt
5287e57c86
Ignore noscript tags when generating URL previews
2017-10-25 20:44:34 -04:00
Richard van der Hoff
eaaabc6c4f
replace 'except:' with 'except Exception:'
...
what could possibly go wrong
2017-10-23 15:52:32 +01:00
Erik Johnston
2b24416e90
Don't reuse source but instead copy from primary media store to backup
2017-10-13 14:11:34 +01:00
Erik Johnston
505371414f
Fix up thumbnailing function
2017-10-13 11:23:53 +01:00
Erik Johnston
d76621a47b
Fix comments
2017-10-12 18:16:25 +01:00
Erik Johnston
802ca12d05
Don't close file prematurely
2017-10-12 17:37:21 +01:00
Erik Johnston
e283b555b1
Copy everything to backup
2017-10-12 17:31:24 +01:00
Erik Johnston
d5694ac5fa
Only log if we've removed media
2017-09-28 16:08:08 +01:00
Erik Johnston
7cc483aa0e
Clear up expired url cache every 10s
2017-09-28 13:56:53 +01:00
Erik Johnston
e1e7d76cf1
Actually assign result to variable
2017-09-28 13:55:29 +01:00
Erik Johnston
5f501ec7e2
Fix typo in url cache expiry timer
2017-09-28 12:59:01 +01:00
Erik Johnston
ae79764fe5
Change expires column to expires_ts
2017-09-28 12:37:53 +01:00
Erik Johnston
9ccb4226ba
Delete expired url cache data
2017-09-28 12:18:06 +01:00
Erik Johnston
7fe8ed1787
Store URL cache preview downloads seperately
...
This makes it easier to clear old media out at a later date
2017-06-23 11:14:11 +01:00
Matthew Hodgson
836d5c44b6
actually trim oversize og:description meta
2017-05-22 21:14:20 +01:00
Marcin Bachry
24c16fc349
Fix crash in url preview when html tag has no text
...
Signed-off-by: Marcin Bachry <hegel666@gmail.com>
2016-12-14 22:38:18 +01:00
Johannes Löthberg
32c8b5507c
preview_url_resource: Ellipsis must be in unicode string
...
Signed-off-by: Johannes Löthberg <johannes@kyriasis.com>
2016-12-01 13:12:13 +01:00
Erik Johnston
f90b3d83a3
Add None check to _iterate_over_text
2016-08-17 15:17:17 +01:00
Erik Johnston
109a560905
Flake8
2016-08-16 14:57:21 +01:00
Erik Johnston
48b5829aea
Fix up preview URL API. Add tests.
...
This includes:
- Splitting out methods of a class into stand alone functions, to make
them easier to test.
- Adding unit tests to split out functions, testing HTML -> preview.
- Handle the fact that elements in lxml may have tail text.
2016-08-16 14:53:24 +01:00
Erik Johnston
5bcccfde6c
Don't include html comments in description
2016-08-05 14:45:11 +01:00
Erik Johnston
b5525c76d1
Typo
2016-08-04 16:10:08 +01:00
Erik Johnston
e97648c4e2
Test summarization
2016-08-04 16:09:09 +01:00
Erik Johnston
58c9653c6b
Don't infer paragrahs from newlines
2016-08-02 18:50:24 +01:00
Erik Johnston
6b58ade2f0
Comment on why we clone
2016-08-02 18:41:22 +01:00
Erik Johnston
9e66c58ceb
Spelling.
2016-08-02 18:37:31 +01:00
Erik Johnston
f83f5fbce8
Make it actually compile
2016-08-02 18:32:42 +01:00
Erik Johnston
aecaec3e10
Change the way we summarize URLs
...
Using XPath is slow on some machines (for unknown reasons), so use a
different approach to get a list of text nodes.
Try to generate a summary that respect paragraph and then word
boundaries, adding ellipses when appropriate.
2016-08-02 18:25:53 +01:00
Erik Johnston
09a17f965c
Line lengths
2016-06-15 16:58:12 +01:00
Erik Johnston
1e9026e484
Handle floats as img widths
2016-06-15 16:58:05 +01:00
Erik Johnston
a60169ea09
Handle og props with not content
2016-06-15 16:57:48 +01:00
Mark Haines
eb79110beb
Clean up the blacklist/whitelist handling.
...
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines
8d7ad44331
Report per request metrics for all of the things using request_handler
2016-04-28 10:57:49 +01:00
Erik Johnston
e8884e5e9c
Add self.media_repo to PreviewUrlResource
2016-04-19 14:51:34 +01:00
Erik Johnston
a7001c311b
_make_dirs was moved to MediaRepository
2016-04-19 14:49:31 +01:00
Erik Johnston
9181e2f4c7
Add store to PreviewUrlResource
2016-04-19 14:48:24 +01:00
Erik Johnston
fb76a81ff7
Reorder imports
2016-04-19 14:45:05 +01:00
Erik Johnston
43f0941e8f
Split out BaseMediaResource into MediaRepository
...
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.
In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson
aaabbd3e9e
explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better
2016-04-15 14:32:25 +01:00
Matthew Hodgson
84f9cac4d0
fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong
2016-04-15 13:20:08 +01:00