Erik Johnston
eba4ff1bcb
502 on /thumbnail when can't contact remote server
2016-06-09 11:29:43 +01:00
Mark Haines
eb79110beb
Clean up the blacklist/whitelist handling.
...
Always set the config key with an empty list, even if a list isn't specified.
This means that the codepaths are the same for both the empty list and
for a missing key. Since the behaviour is the same for both cases this
makes the code somewhat easier to reason about.
2016-05-16 13:03:59 +01:00
Mark Haines
8d7ad44331
Report per request metrics for all of the things using request_handler
2016-04-28 10:57:49 +01:00
Erik Johnston
e8884e5e9c
Add self.media_repo to PreviewUrlResource
2016-04-19 14:51:34 +01:00
Erik Johnston
a7001c311b
_make_dirs was moved to MediaRepository
2016-04-19 14:49:31 +01:00
Erik Johnston
9181e2f4c7
Add store to PreviewUrlResource
2016-04-19 14:48:24 +01:00
Erik Johnston
fb76a81ff7
Reorder imports
2016-04-19 14:45:05 +01:00
Erik Johnston
0c93df89b6
Move MediaRepository to media_repository module
2016-04-19 11:31:43 +01:00
Erik Johnston
43f0941e8f
Split out BaseMediaResource into MediaRepository
...
This is so that a single MediaRepository can be shared across all
resources, rather than having a "copy" per resource.
In particular this allows us to guard against both the thumbnail and
download resource triggering a download of remote content at the same
time.
2016-04-19 11:24:59 +01:00
Matthew Hodgson
aaabbd3e9e
explicitly pass in the charset from Content-Type to lxml to fix cyrillic woes better
2016-04-15 14:32:25 +01:00
Matthew Hodgson
84f9cac4d0
fix cyrillic URL previews by hardcoding all page decoding to UTF-8 for now, rather than relying on lxml's heuristics which seem to get it wrong
2016-04-15 13:20:08 +01:00
Matthew Hodgson
f78b479118
fix urlparse import thinko breaking tiny URLs
2016-04-14 15:23:55 +01:00
Matthew Hodgson
bd77216d06
comment out 2c838f6459
due to risk of https://en.wikipedia.org/wiki/Billion_laughs attacks - thanks @torhve
2016-04-14 14:39:24 +01:00
Erik Johnston
d0633e6dbe
Sanitize the optional dependencies for spider API
2016-04-13 13:38:09 +01:00
Erik Johnston
17515bae14
PEP8
2016-04-11 11:02:50 +01:00
Matthew Hodgson
5ffacc5e84
fix typos and needless try/except from PR review
2016-04-11 10:39:16 +01:00
Matthew Hodgson
83b2f83da0
actually throw meaningful errors
2016-04-08 21:36:59 +01:00
Mark Haines
b36270b5e1
Fix pep8 warning
2016-04-08 19:52:23 +01:00
Matthew Hodgson
1ccabe2965
more PR feedback
2016-04-08 18:58:08 +01:00
Matthew Hodgson
dafef5a688
Add url_preview_enabled config option to turn on/off preview_url endpoint. defaults to off.
...
Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered.
Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered.
Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist
Add commentary and generally address PR feedback
2016-04-08 18:37:15 +01:00
Matthew Hodgson
cf51c4120e
report image size (bytewise) in OG meta
2016-04-03 23:57:05 +01:00
Matthew Hodgson
0834b152fb
char encoding
2016-04-03 12:59:27 +01:00
Matthew Hodgson
8b98a7e8c3
pep8
2016-04-03 12:56:29 +01:00
Matthew Hodgson
eab4d462f8
fix etag typing error. fix timestamp typing error
2016-04-03 02:02:46 +01:00
Matthew Hodgson
c3916462f6
rebase all image URLs
2016-04-03 01:33:12 +01:00
Matthew Hodgson
110780b18b
remove stale todo
2016-04-03 00:48:31 +01:00
Matthew Hodgson
b09e29a03c
Ensure only one download for a given URL is active at a time
2016-04-03 00:47:40 +01:00
Matthew Hodgson
7426c86eb8
add a persistent cache of URL lookups, and fix up the in-memory one to work
2016-04-03 00:31:57 +01:00
Matthew Hodgson
d1b154a10f
support gzip compression, and don't pass through error msgs
2016-04-02 03:06:39 +01:00
Matthew Hodgson
9377157961
how was _respond_default_thumbnail ever meant to work?
2016-04-02 02:31:45 +01:00
Matthew Hodgson
2c838f6459
pass back SVGs as their own thumbnails
2016-04-02 02:30:07 +01:00
Matthew Hodgson
5037ee0d37
handle missing dimensions without crashing
2016-04-02 02:29:57 +01:00
Matthew Hodgson
b26e8604f1
make meta comparisons case insensitive
2016-04-02 01:35:44 +01:00
Matthew Hodgson
5fd07da764
refactor calc_og; spider image URLs; fix xpath; add a (broken) expiringcache; loads of other fixes
2016-04-02 00:35:49 +01:00
Matthew Hodgson
c60b751694
fix assorted redirect, unicode and screenscraping bugs
2016-04-01 02:17:48 +01:00
Matthew Hodgson
683e564815
handle spidered relative images correctly
2016-03-31 23:52:58 +01:00
Matthew Hodgson
72550c3803
prevent choking on invalid utf-8, and handle image thumbnailing smarter
2016-03-31 15:14:14 +01:00
Matthew Hodgson
bb9a2ca87c
synthesise basig OG metadata from pages lacking it
2016-03-31 14:15:09 +01:00
Matthew Hodgson
a8a5dd3b44
handle requests with missing content-length headers (e.g. YouTube)
2016-03-31 01:55:21 +01:00
Matthew Hodgson
ae5831d303
fix bugs
2016-03-29 03:32:55 +01:00
Matthew Hodgson
19038582d3
debug
2016-03-29 03:14:16 +01:00
Matthew Hodgson
64b4aead15
make it work
2016-03-29 03:13:25 +01:00
Matthew Hodgson
dd4287ca5d
make it build
2016-03-29 02:07:57 +01:00
Matthew Hodgson
d9d48aad2d
Merge branch 'develop' into matthew/preview_urls
2016-03-27 22:54:42 +01:00
Mark Haines
58c9f20692
Catch the exceptions thrown by twisted when you write to a closed connection
2016-02-12 13:46:59 +00:00
Erik Johnston
33c71c3a4b
Preserve log context over when deferring to thread pool in media repo
2016-02-03 16:17:18 +00:00
Matthew Hodgson
7dd0c1730a
initial WIP of a tentative preview_url endpoint - incomplete, untested, experimental, etc. just putting it here for safekeeping for now
2016-01-24 18:47:27 -05:00
Daniel Wagner-Hall
42aa1f3f33
Merge pull request #478 from matrix-org/daniel/userobject
...
Introduce a User object
I'm sick of passing around more and more things as tuple items around
the whole world, and needing to edit every call site every time there is
more information about a user. So pass them around together as an
object.
This object has incredibly poorly named fields because we have a
convention that `user` indicates a UserID object, and `user_id`
indicates a string. I tried to clean up the whole repo to fix this, but
gave up. So instead, I introduce a second convention. A user_object is a
User, and a user_id_object is a UserId. I may have cried a little bit.
2016-01-11 17:50:22 +00:00
Daniel Wagner-Hall
2110e35fd6
Introduce a Requester object
...
This tracks data about the entity which made the request. This is
instead of passing around a tuple, which requires call-site
modifications every time a new piece of optional context is passed
around.
I tried to introduce a User object. I gave up.
2016-01-11 17:48:45 +00:00
Mark Haines
8677b7d698
Only use cropped thumbnails when asked for a cropped thumbnail.
...
Even though ones cropped with scale might be technically valid.
2016-01-07 18:57:15 +00:00