Compare commits
38 Commits
Author | SHA1 | Date |
---|---|---|
df | 5183884250 | |
df | f04341939d | |
df | cebc7be09a | |
df | 3cc7e72d9c | |
df | 35bf1f5971 | |
df | 0345a064a7 | |
df | 2d714ecb56 | |
df | 0ad58e61a7 | |
df | 0b9bac2d45 | |
df | 635225fa3f | |
dirkf | ab196ce69e | |
Sergey M․ | 208509b528 | |
df | e3a336bf4e | |
df | 56fc561c8d | |
df | 216c65d467 | |
df | 42f1bb6506 | |
df | 8524dd70f9 | |
df | b094d67002 | |
df | e23abe407e | |
df | 92e0b02ec2 | |
dirkf | 40152ecb68 | |
df | 081846a711 | |
df | 1f3bdf9ad8 | |
dirkf | 200b9eebb3 | |
df | f3a33f91e2 | |
df | a4cbe8f909 | |
df | 1189422cd0 | |
df | 62c225ef44 | |
df | beb803cd3b | |
df | 779663d086 | |
df | 5d2bc1461c | |
df | bf9254077b | |
df | d8e6815fef | |
df | 9bcc47eef0 | |
df | 4fc9148ab7 | |
df | f800b76250 | |
df | 90745d224b | |
df | 48d5aba7b6 |
|
@ -18,7 +18,7 @@ title: ''
|
|||
|
||||
<!--
|
||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- First of all, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
|
||||
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
|
||||
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||
|
|
|
@ -19,7 +19,7 @@ labels: 'site-support-request'
|
|||
|
||||
<!--
|
||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- First of all, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
|
||||
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
|
||||
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||
|
|
|
@ -18,7 +18,7 @@ title: ''
|
|||
|
||||
<!--
|
||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- First of all, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||
- Finally, put x into all relevant boxes (like this [x])
|
||||
-->
|
||||
|
|
|
@ -18,7 +18,7 @@ title: ''
|
|||
|
||||
<!--
|
||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- First of all, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
|
||||
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
|
||||
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||
|
|
|
@ -19,7 +19,7 @@ labels: 'request'
|
|||
|
||||
<!--
|
||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.06.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- First of all, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||
- Finally, put x into all relevant boxes (like this [x])
|
||||
-->
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
*.sav
|
|
@ -0,0 +1,45 @@
|
|||
#
|
||||
# Makefile for Hummy youtube-dl package
|
||||
#
|
||||
DISTPKG := python2.7/dist-packages
|
||||
|
||||
.PHONY: all clean opkg clean-opk clean-opkg release opkg-bin opkg-lib opkg-sav
|
||||
|
||||
MV ?= mv -f
|
||||
|
||||
define nl
|
||||
|
||||
|
||||
endef
|
||||
|
||||
clean-opk: $(wildcard *.opk)
|
||||
-$(RM) $?
|
||||
|
||||
clean-opkg:
|
||||
-$(RM) -r opkg/bin
|
||||
-$(RM) -r opkg/lib
|
||||
|
||||
clean: clean-opk clean-opkg
|
||||
|
||||
opkg-lib: ../youtube_dl
|
||||
-$(RM) -r opkg/lib/$(DISTPKG)
|
||||
install -d opkg/lib/$(DISTPKG)
|
||||
cp -rpP $< opkg/lib/$(DISTPKG)
|
||||
|
||||
opkg-bin: $(wildcard ../bin/*) $(wildcard bin/*)
|
||||
install -d opkg/bin
|
||||
install -p $? opkg/bin
|
||||
|
||||
opkg-sav: $(wildcard *.opk)
|
||||
$(foreach opk,$^,-$(MV) "$(opk)" "$(opk).sav"$(nl))
|
||||
|
||||
opkg: $(wildcard opkg/CONTROL/*) opkg-bin opkg-lib opkg-sav
|
||||
opkg-pack opkg
|
||||
-for opk in youtube-dl*.opk; do $(RM) "$${opk}.sav"; done
|
||||
|
||||
release: opkg
|
||||
tagname="$$(for opk in youtube-dl*.opk; do echo "$$opk"; break; done)" && \
|
||||
tagname="$${tagname%_*.*}" && \
|
||||
test -n "$${tagname}" && \
|
||||
git tag -f -a -m "Release $${tagname}" "$${tagname}"
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
## Build and installation scripts for the Hummy youtube-dl package
|
||||
|
||||
# make clean
|
||||
|
||||
Remove build artefacts.
|
||||
|
||||
# make opkg
|
||||
|
||||
Create package using youtube-dl files from parent directory.
|
||||
|
||||
# make release
|
||||
|
||||
Tag the release with the version of the package made
|
||||
|
|
@ -0,0 +1,26 @@
|
|||
#!/bin/sh
|
||||
# Usage: fixsttl media_file
|
||||
# for a media file, convert its .locale.srt, if any, to plain text .srt
|
||||
|
||||
# can be overriden STTL_LANG=da-DK, etc
|
||||
STTL_LANG=${STTL_LANG:-en-GB}
|
||||
|
||||
main() {
|
||||
local ext froot srt
|
||||
[ -n "$1" ] || exit
|
||||
# any other extensions?
|
||||
for ext in mp4 mpg mkv; do
|
||||
froot=${1%.$ext}
|
||||
[ "$1" != "$froot" ] && break
|
||||
done
|
||||
[ "$1" = "$froot" ] && return 1
|
||||
srt=${froot}.${STTL_LANG}.srt
|
||||
[ -r "$srt" ] || return
|
||||
# *.en-GB.srt -> *.srt
|
||||
iconv -f UTF-8 -t LATIN1 "$srt" |
|
||||
# strip <tags> and </tags>
|
||||
sed -r -e 's@<[/a-zA-Z]+( [^>]*)?>@@g' > "${froot}.srt" &&
|
||||
{ rm -f -- "$srt"; return 0; }
|
||||
}
|
||||
|
||||
main "$@"
|
|
@ -0,0 +1,55 @@
|
|||
#!/bin/sh
|
||||
# scrape iPlayer programme URLs from a BBC web page
|
||||
|
||||
# args: [--queue|-q] iplayer_series_url
|
||||
|
||||
mung_url()
|
||||
{ # prefix
|
||||
local url
|
||||
while read url; do
|
||||
url=${url##href=\"};
|
||||
echo $1${url%%\"}
|
||||
done
|
||||
}
|
||||
|
||||
case $1 in
|
||||
|
||||
--queue|-q)
|
||||
if which qtube >/dev/null; then
|
||||
qqq() {
|
||||
while read -r line; do
|
||||
qtube "$@" "$line"
|
||||
done
|
||||
}
|
||||
else
|
||||
printf "No qtube program is installed; listing qtube commands\n" >&2
|
||||
qqq() {
|
||||
while read -r line; do
|
||||
echo qtube "$@" $(printf "'%s'" "$line")
|
||||
done
|
||||
}
|
||||
fi
|
||||
shift
|
||||
;;
|
||||
|
||||
--help|-h) {
|
||||
printf "Usage:\n\n%s [--queue|-q] iplayer_series_url\n\n" "${0##*/}"
|
||||
printf "Extract iPlayer programme URLs from series page and pass to youtube-dl.\n\n"
|
||||
printf "With queue option, instead try to queue each URL for download.\n\n"
|
||||
} 1>&2
|
||||
exit
|
||||
;;
|
||||
|
||||
*) qqq() { youtube -a -; }
|
||||
;;
|
||||
|
||||
esac
|
||||
|
||||
# get BBC's base address
|
||||
bbc="$1"; bbc="${bbc%%/iplayer*}"
|
||||
|
||||
# parse the web page for episode URLs, extract and prepare them for youtube-dl
|
||||
# curl: -k insecure, needed due to Humax's old SSL libs, -s silent, -S show errors anyway
|
||||
# grep: -o print matching substring, -E match extended regular expression
|
||||
curl -k -s -S $1 | grep -oE "href=('|\")/iplayer/episode/[^'\"]+\\1" | mung_url $bbc | \
|
||||
sort | uniq | qqq
|
|
@ -0,0 +1,2 @@
|
|||
#!/bin/sh
|
||||
python /mod/lib/python2.7/dist-packages/youtube_dl "$@"
|
|
@ -0,0 +1,8 @@
|
|||
Package: youtube-dl
|
||||
Priority: optional
|
||||
Section: misc
|
||||
Version: 2021.06.06.1
|
||||
Architecture: mipsel
|
||||
Maintainer: prpr
|
||||
Depends: ffmpeg(>=4.1),wget(>=1.20),python,libiconv
|
||||
Description: Download videos from youtube.com or other video platforms
|
|
@ -0,0 +1,81 @@
|
|||
#!/bin/sh
|
||||
|
||||
distpkgs=/mod/lib/python2.7/dist-packages
|
||||
|
||||
cfgfile='/mod/etc/youtube-dl.conf'
|
||||
oldcfgfile="${cfgfile}.old"
|
||||
|
||||
# the default settings
|
||||
def_settings() {
|
||||
cat << EOM
|
||||
--restrict-filenames
|
||||
--prefer-ffmpeg
|
||||
-f|--format "best[height<=?1080][fps<=?30]"
|
||||
-o|--output "$outdir/%(title)s.%(ext)s"
|
||||
EOM
|
||||
}
|
||||
|
||||
is_set() { # option
|
||||
# -w whole words -q just set return code -E extended regexp
|
||||
[ -f "$cfgfile" ] && grep -wq -E -e "($1)" "$cfgfile"
|
||||
}
|
||||
|
||||
case "$(cat /etc/model)" in
|
||||
HDR)
|
||||
outdir='/mnt/hd2/My Video'
|
||||
;;
|
||||
*) # HD
|
||||
outdir='/media/drive1/Video'
|
||||
;;
|
||||
esac
|
||||
|
||||
settings="$(mktemp)"
|
||||
def_settings |
|
||||
while read opt val; do
|
||||
# only add settings that aren't already set
|
||||
if ! is_set "$opt"; then
|
||||
echo "${opt%%|*}" "$val" >>"$settings"
|
||||
fi
|
||||
done
|
||||
if [ -s "$settings" ]; then
|
||||
if [ -f "$cfgfile" ]; then
|
||||
cp "$cfgfile" "$oldcfgfile"
|
||||
echo "Your youtube-dl settings file has been updated and"
|
||||
echo "the previous settings file saved as $oldcfgfile"
|
||||
fi
|
||||
cat "$settings" >>"$cfgfile"
|
||||
fi
|
||||
rm "$settings"
|
||||
sed -i 's/fps<=?30/fps<=?60/' "$cfgfile"
|
||||
|
||||
# make python recognise the distribution pkg directory
|
||||
patch_python() {
|
||||
profile=/mod/etc/profile/python
|
||||
if ! grep -qF "$distpkgs" "$profile"; then
|
||||
printf 'export PYTHONPATH="%s"\n' "$distpkgs" >> "$profile"
|
||||
printf "\nLog out and in again to set PYTHONPATH\n\n"
|
||||
fi
|
||||
}
|
||||
patch_python
|
||||
|
||||
find "${distpkgs}/youtube_dl" -name '*.pyc' -exec rm -f "{}" \;
|
||||
|
||||
# remove pre-20201112 installation
|
||||
for tag in /tmp/.ytdl_*; do
|
||||
[ -e "$tag" ] || continue
|
||||
echo "$tag" |
|
||||
( while IFS=_ read _ ver _; do
|
||||
if [ "$ver" -lt 20201112 -a -e "${distpkgs}/youtube-dl" ]; then
|
||||
rm -f "$tag"
|
||||
find "${distpkgs}/youtube-dl" -name '*.pyc' -exec rm -f "{}" \;
|
||||
rmdir "${distpkgs}/youtube-dl" || true
|
||||
exit
|
||||
fi
|
||||
done )
|
||||
done
|
||||
|
||||
# background compile
|
||||
youtube-dl --version >/dev/null &
|
||||
|
||||
exit 0
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
#!/bin/sh
|
||||
cfgfile='/mod/etc/youtube-dl.conf'
|
||||
oldcfgfile="${cfgfile}.old"
|
||||
pkgdir="/mod/lib/python2.7/dist-packages/youtube_dl"
|
||||
[ -f "$cfgfile" ] && rm "$cfgfile"
|
||||
[ -f "$oldcfgfile" ] && rm "$oldcfgfile"
|
||||
[ -d "$pkgdir" ] && rm -r "$pkgdir"
|
||||
exit 0
|
|
@ -0,0 +1,8 @@
|
|||
#!/bin/sh
|
||||
CTL=/mod/var/opkg/info/youtube-dl.control
|
||||
[ -r "$CTL" ] &&
|
||||
grep -E '^Version:' "$CTL" |
|
||||
( while IFS=".${IFS}" read _ yy mm dd _; do
|
||||
echo >"/tmp/.ytdl_${yy}${mm}${dd}"
|
||||
break
|
||||
done )
|
|
@ -1,4 +1,4 @@
|
|||
#!/usr/bin/env python
|
||||
#!/bin/env python
|
||||
|
||||
import youtube_dl
|
||||
|
||||
|
|
|
@ -66,9 +66,9 @@ class TestAllURLsMatching(unittest.TestCase):
|
|||
self.assertMatch('https://www.youtube.com/feed/watch_later', ['youtube:tab'])
|
||||
self.assertMatch('https://www.youtube.com/feed/subscriptions', ['youtube:tab'])
|
||||
|
||||
# def test_youtube_search_matching(self):
|
||||
# self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
|
||||
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
|
||||
def test_youtube_search_matching(self):
|
||||
self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
|
||||
self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
|
||||
|
||||
def test_facebook_matching(self):
|
||||
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
|
||||
|
|
|
@ -65,6 +65,8 @@ from youtube_dl.utils import (
|
|||
sanitize_filename,
|
||||
sanitize_path,
|
||||
sanitize_url,
|
||||
extract_user_pass,
|
||||
sanitized_Request,
|
||||
expand_path,
|
||||
prepend_extension,
|
||||
replace_extension,
|
||||
|
@ -237,6 +239,26 @@ class TestUtil(unittest.TestCase):
|
|||
self.assertEqual(sanitize_url('rmtps://foo.bar'), 'rtmps://foo.bar')
|
||||
self.assertEqual(sanitize_url('https://foo.bar'), 'https://foo.bar')
|
||||
|
||||
def test_extract_user_pass(self):
|
||||
self.assertEqual(extract_user_pass('http://foo.bar'), ('http://foo.bar', None, None))
|
||||
self.assertEqual(extract_user_pass('http://:foo.bar'), ('http://:foo.bar', None, None))
|
||||
self.assertEqual(extract_user_pass('http://@foo.bar'), ('http://foo.bar', '', ''))
|
||||
self.assertEqual(extract_user_pass('http://:pass@foo.bar'), ('http://foo.bar', '', 'pass'))
|
||||
self.assertEqual(extract_user_pass('http://user:@foo.bar'), ('http://foo.bar', 'user', ''))
|
||||
self.assertEqual(extract_user_pass('http://user:pass@foo.bar'), ('http://foo.bar', 'user', 'pass'))
|
||||
|
||||
def test_sanitized_Request(self):
|
||||
self.assertFalse(sanitized_Request('http://foo.bar').has_header('Authorization'))
|
||||
self.assertFalse(sanitized_Request('http://:foo.bar').has_header('Authorization'))
|
||||
self.assertEqual(sanitized_Request('http://@foo.bar').get_header('Authorization'),
|
||||
'Basic Og==')
|
||||
self.assertEqual(sanitized_Request('http://:pass@foo.bar').get_header('Authorization'),
|
||||
'Basic OnBhc3M=')
|
||||
self.assertEqual(sanitized_Request('http://user:@foo.bar').get_header('Authorization'),
|
||||
'Basic dXNlcjo=')
|
||||
self.assertEqual(sanitized_Request('http://user:pass@foo.bar').get_header('Authorization'),
|
||||
'Basic dXNlcjpwYXNz')
|
||||
|
||||
def test_expand_path(self):
|
||||
def env(var):
|
||||
return '%{0}%'.format(var) if sys.platform == 'win32' else '${0}'.format(var)
|
||||
|
|
|
@ -0,0 +1,2 @@
|
|||
#!/bin/sh
|
||||
git pull masterGL master:masterGL
|
|
@ -1,4 +1,4 @@
|
|||
#!/usr/bin/env python
|
||||
#!/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
from __future__ import absolute_import, unicode_literals
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
#!/usr/bin/env python
|
||||
#!/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
from __future__ import unicode_literals
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
#!/usr/bin/env python
|
||||
#!/mod/bin/busybox/env python
|
||||
from __future__ import unicode_literals
|
||||
|
||||
# Execute with
|
||||
|
|
|
@ -24,6 +24,7 @@ from ..utils import (
|
|||
get_element_by_class,
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
parse_bitrate,
|
||||
parse_duration,
|
||||
parse_iso8601,
|
||||
strip_or_none,
|
||||
|
@ -68,6 +69,8 @@ class BBCCoUkIE(InfoExtractor):
|
|||
|
||||
_EMP_PLAYLIST_NS = 'http://bbc.co.uk/2008/emp/playlist'
|
||||
|
||||
_DESCRIPTION_KEY = 'synopses'
|
||||
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
|
||||
|
@ -262,6 +265,21 @@ class BBCCoUkIE(InfoExtractor):
|
|||
}, {
|
||||
'url': 'https://www.bbc.co.uk/programmes/w172w4dww1jqt5s',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# audio-described
|
||||
'url': 'https://www.bbc.co.uk/iplayer/episode/m000b1v0/ad/his-dark-materials-series-1-1-lyras-jordan',
|
||||
'info_dict': {
|
||||
'id': 'p07ss5kj',
|
||||
'ext': 'mp4',
|
||||
'title': 'His Dark Materials - Series 1: 1. Lyra\u2019s Jordan - Audio Described',
|
||||
'description': 'Orphan Lyra Belacqua\'s world is turned upside-down by her long-absent uncle\'s return from the north, while the glamorous Mrs Coulter visits Jordan College with a proposition.',
|
||||
'duration': 3407,
|
||||
},
|
||||
'params': {
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'geolocation',
|
||||
}]
|
||||
|
||||
def _login(self):
|
||||
|
@ -317,6 +335,10 @@ class BBCCoUkIE(InfoExtractor):
|
|||
def _extract_connections(self, media):
|
||||
return media.get('connection') or []
|
||||
|
||||
def _get_description(self, data):
|
||||
synopses = try_get(data, lambda x: x[self._DESCRIPTION_KEY], dict) or {}
|
||||
return dict_get(synopses, ('large', 'medium', 'small'))
|
||||
|
||||
def _get_subtitles(self, media, programme_id):
|
||||
subtitles = {}
|
||||
for connection in self._extract_connections(media):
|
||||
|
@ -352,7 +374,9 @@ class BBCCoUkIE(InfoExtractor):
|
|||
last_exception = e
|
||||
continue
|
||||
self._raise_extractor_error(e)
|
||||
self._raise_extractor_error(last_exception)
|
||||
if last_exception and not formats:
|
||||
self._raise_extractor_error(last_exception)
|
||||
return formats, subtitles
|
||||
|
||||
def _download_media_selector_url(self, url, programme_id=None):
|
||||
media_selection = self._download_json(
|
||||
|
@ -542,15 +566,45 @@ class BBCCoUkIE(InfoExtractor):
|
|||
|
||||
programme_id = None
|
||||
duration = None
|
||||
description = None
|
||||
thumbnail = None
|
||||
|
||||
tviplayer = self._search_regex(
|
||||
r'mediator\.bind\(({.+?})\s*,\s*document\.getElementById',
|
||||
webpage, 'player', default=None)
|
||||
# current pages embed data from http://www.bbc.co.uk/programmes/PID.json
|
||||
# similar data available at http://ibl.api.bbc.co.uk/ibl/v1/episodes/PID
|
||||
redux_state = self._parse_json(self._html_search_regex(
|
||||
r'<script\b[^>]+id=(["\'])tvip-script-app-store\1[^>]*>[^<]*_REDUX_STATE__\s*=\s*(?P<json>[^<]+)\s*;\s*<',
|
||||
webpage, 'redux state', default='{}', group='json'), group_id, fatal=False)
|
||||
episode = redux_state.get('episode', {})
|
||||
if episode.get('id') == group_id:
|
||||
# try to match the version against the page's version
|
||||
current_version = episode.get('currentVersion')
|
||||
kinds = ['original']
|
||||
if current_version == 'ad':
|
||||
kinds.insert(0, 'audio-described')
|
||||
for kind in kinds:
|
||||
for version in redux_state.get('versions', {}):
|
||||
if try_get(version, lambda x: x['kind'], compat_str) == kind:
|
||||
programme_id = version.get('id')
|
||||
duration = try_get(version, lambda x: x['duration']['seconds'], int)
|
||||
break
|
||||
if programme_id:
|
||||
break
|
||||
if programme_id:
|
||||
description = self._get_description(episode)
|
||||
thumbnail = try_get(episode, lambda x: x['images']['standard'], compat_str)
|
||||
if thumbnail:
|
||||
thumbnail = thumbnail.format(recipe='raw')
|
||||
|
||||
if tviplayer:
|
||||
player = self._parse_json(tviplayer, group_id).get('player', {})
|
||||
duration = int_or_none(player.get('duration'))
|
||||
programme_id = player.get('vpid')
|
||||
if not programme_id:
|
||||
# still valid?
|
||||
tviplayer = self._search_regex(
|
||||
r'mediator\.bind\(({.+?})\s*,\s*document\.getElementById',
|
||||
webpage, 'player', default=None)
|
||||
|
||||
if tviplayer:
|
||||
player = self._parse_json(tviplayer, group_id).get('player', {})
|
||||
duration = int_or_none(player.get('duration'))
|
||||
programme_id = player.get('vpid')
|
||||
|
||||
if not programme_id:
|
||||
programme_id = self._search_regex(
|
||||
|
@ -561,7 +615,7 @@ class BBCCoUkIE(InfoExtractor):
|
|||
title = self._og_search_title(webpage, default=None) or self._html_search_regex(
|
||||
(r'<h2[^>]+id="parent-title"[^>]*>(.+?)</h2>',
|
||||
r'<div[^>]+class="info"[^>]*>\s*<h1>(.+?)</h1>'), webpage, 'title')
|
||||
description = self._search_regex(
|
||||
description = description or self._search_regex(
|
||||
(r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
|
||||
r'<div[^>]+class="info_+synopsis"[^>]*>([^<]+)</div>'),
|
||||
webpage, 'description', default=None)
|
||||
|
@ -576,7 +630,7 @@ class BBCCoUkIE(InfoExtractor):
|
|||
'id': programme_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'thumbnail': self._og_search_thumbnail(webpage, default=None),
|
||||
'thumbnail': thumbnail or self._og_search_thumbnail(webpage, default=None),
|
||||
'duration': duration,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
|
@ -638,9 +692,7 @@ class BBCIE(BBCCoUkIE):
|
|||
'skip_download': True,
|
||||
}
|
||||
}, {
|
||||
# article with single video embedded with data-playable containing XML playlist
|
||||
# with direct video links as progressiveDownloadUrl (for now these are extracted)
|
||||
# and playlist with f4m and m3u8 as streamingUrl
|
||||
# article with single video (formerly) embedded, now using SIMORGH_DATA JSON
|
||||
'url': 'http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu',
|
||||
'info_dict': {
|
||||
'id': '150615_telabyad_kentin_cogu',
|
||||
|
@ -652,12 +704,13 @@ class BBCIE(BBCCoUkIE):
|
|||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
}
|
||||
},
|
||||
'skip': 'Video no longer embedded, 2021',
|
||||
}, {
|
||||
# single video embedded with data-playable containing XML playlists (regional section)
|
||||
# single video embedded, legacy media, in promo object of SIMORGH_DATA JSON
|
||||
'url': 'http://www.bbc.com/mundo/video_fotos/2015/06/150619_video_honduras_militares_hospitales_corrupcion_aw',
|
||||
'info_dict': {
|
||||
'id': '150619_video_honduras_militares_hospitales_corrupcion_aw',
|
||||
'id': '39275083',
|
||||
'ext': 'mp4',
|
||||
'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción',
|
||||
'description': 'md5:1525f17448c4ee262b64b8f0c9ce66c8',
|
||||
|
@ -750,6 +803,16 @@ class BBCIE(BBCCoUkIE):
|
|||
'description': 'Fast-paced football, wit, wisdom and a ready smile - why Liverpool fans should come to love new boss Jurgen Klopp.',
|
||||
},
|
||||
'playlist_count': 3,
|
||||
}, {
|
||||
# single video embedded, data in playlistObject of playerSettings
|
||||
'url': 'https://www.bbc.com/news/av/embed/p07xmg48/50670843',
|
||||
'info_dict': {
|
||||
'id': 'p07xmg48',
|
||||
'ext': 'mp4',
|
||||
'title': 'General election 2019: From the count, to your TV',
|
||||
'description': 'General election 2019: From the count, to your TV',
|
||||
'duration': 160,
|
||||
},
|
||||
}, {
|
||||
# school report article with single video
|
||||
'url': 'http://www.bbc.co.uk/schoolreport/35744779',
|
||||
|
@ -813,6 +876,17 @@ class BBCIE(BBCCoUkIE):
|
|||
}, {
|
||||
# BBC Reel
|
||||
'url': 'https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness',
|
||||
'info_dict': {
|
||||
'id': 'mind-matters',
|
||||
'title': 'Mind Matters',
|
||||
'description': 'Uncovering the mysteries of our minds and the importance of mental health and well-being.',
|
||||
'duration': 3083,
|
||||
'upload_date': '20181214',
|
||||
},
|
||||
'playlist_count': 13,
|
||||
}, {
|
||||
# BBC Reel playlist and video => video
|
||||
'url': 'https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness',
|
||||
'info_dict': {
|
||||
'id': 'p07c6sb9',
|
||||
'ext': 'mp4',
|
||||
|
@ -824,6 +898,86 @@ class BBCIE(BBCCoUkIE):
|
|||
'upload_date': '20190604',
|
||||
'categories': ['Psychology'],
|
||||
},
|
||||
'params': {
|
||||
'no-playlist': True,
|
||||
},
|
||||
}, {
|
||||
# BBC Reel video and playlist => video
|
||||
'url': 'https://www.bbc.com/reel/video/p099tghy/is-phrenology-the-weirdest-pseudoscience-of-them-all-',
|
||||
'info_dict': {
|
||||
'id': 'p07c6sb9',
|
||||
'ext': 'mp4',
|
||||
'title': 'How positive thinking is harming your happiness',
|
||||
'alt_title': 'The downsides of positive thinking',
|
||||
'description': 'md5:fad74b31da60d83b8265954ee42d85b4',
|
||||
'duration': 235,
|
||||
'thumbnail': r're:https?://.+/p07c9dsr.jpg',
|
||||
'upload_date': '20190604',
|
||||
'categories': ['Psychology'],
|
||||
},
|
||||
}, {
|
||||
# BBC World Service etc: media nested in content object of SIMORGH_DATA JSON
|
||||
'url': 'http://www.bbc.co.uk/scotland/articles/cm49v4x1r9lo',
|
||||
'info_dict': {
|
||||
'id': 'p06p040v',
|
||||
'ext': 'mp4',
|
||||
'title': 'Five things ants can teach us about management',
|
||||
'description': 'They may be tiny, but us humans could learn a thing or two from ants.',
|
||||
'duration': 191,
|
||||
'thumbnail': r're:https?://.+/p06p0qzv.jpg',
|
||||
'upload_date': '20181016',
|
||||
},
|
||||
}, {
|
||||
# BBC Reel specified video and playlist => video
|
||||
'url': 'https://www.bbc.com/reel/playlist/mind-matters?vpid=p0962h5x',
|
||||
'info_dict': {
|
||||
'id': 'p095rkvg',
|
||||
'ext': 'mp4',
|
||||
'title': 'Can you really have a \'photographic\' memory?',
|
||||
'alt_title': 'Why your memory is not like a camera',
|
||||
'description': 'md5:00000000000000000000000000000000',
|
||||
'duration': 211,
|
||||
'thumbnail': r're:https?://.+/p095rrbz.jpg',
|
||||
'upload_date': '20210202',
|
||||
'categories': ['Neuroscience'],
|
||||
},
|
||||
}, {
|
||||
# BBC Reel specified video and playlist => playlist
|
||||
'info_dict': {
|
||||
'id': 'mind-matters',
|
||||
'title': 'Mind Matters',
|
||||
'description': 'Uncovering the mysteries of our minds and the importance of mental health and well-being.',
|
||||
'duration': 3083,
|
||||
'upload_date': '20181214',
|
||||
},
|
||||
'playlist_count': 13,
|
||||
'params': {
|
||||
'no-playlist': False,
|
||||
},
|
||||
}, {
|
||||
# BBC Weather
|
||||
'url': 'https://www.bbc.co.uk/weather/features/55581056',
|
||||
'info_dict': {
|
||||
'id': 'p093xhxl',
|
||||
'ext': 'mp4',
|
||||
'title': 'Weather for the Week Ahead',
|
||||
'description': 'There\'ll be a battle between colder and milder weather in the coming few days, before it turns chillier once again.',
|
||||
'duration': 209,
|
||||
'thumbnail': r're:https?://.+/p093xk3z.jpg',
|
||||
'upload_date': '20210113',
|
||||
},
|
||||
}, {
|
||||
# BBC Bitesize
|
||||
'url': 'https://www.bbc.co.uk/bitesize/guides/zgvq4qt/revision/6',
|
||||
'info_dict': {
|
||||
'id': 'p04yj749',
|
||||
'ext': 'mp4',
|
||||
'title': 'Circuits',
|
||||
'description': 'Learn about and revise electrical circuits, charge, current, power and resistance with GCSE Bitesize Combined Science.',
|
||||
'duration': 205,
|
||||
'thumbnail': r're:https?://.+/p04z1ckk.jpg',
|
||||
'upload_date': '20180223',
|
||||
},
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
|
@ -873,25 +1027,56 @@ class BBCIE(BBCCoUkIE):
|
|||
'subtitles': subtitles,
|
||||
}
|
||||
|
||||
def _extract_from_playlist_object(self, playlist_object):
|
||||
title = playlist_object.get('title')
|
||||
item_0 = try_get(playlist_object, lambda x: x['items'][0], dict)
|
||||
if item_0 and title:
|
||||
description = playlist_object.get('summary')
|
||||
duration = int_or_none(item_0.get('duration'))
|
||||
programme_id = dict_get(item_0, ('vpid', 'versionID'))
|
||||
if programme_id:
|
||||
return {
|
||||
'id': programme_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
}
|
||||
return {}
|
||||
|
||||
def _get_playlist_entry(self, entry):
|
||||
programme_id = entry.get('id')
|
||||
if not programme_id:
|
||||
return
|
||||
formats, subtitles = self._download_media_selector(programme_id)
|
||||
self._sort_formats(formats)
|
||||
entry.update({
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
})
|
||||
return entry
|
||||
|
||||
def _real_extract(self, url):
|
||||
playlist_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, playlist_id)
|
||||
|
||||
json_ld_info = self._search_json_ld(webpage, playlist_id, default={})
|
||||
timestamp = json_ld_info.get('timestamp')
|
||||
|
||||
playlist_title = json_ld_info.get('title')
|
||||
if not playlist_title:
|
||||
playlist_title = self._og_search_title(
|
||||
webpage, default=None) or self._html_search_regex(
|
||||
r'<title>(.+?)</title>', webpage, 'playlist title', default=None)
|
||||
playlist_title = (self._html_search_regex(r'<title\b[^>]*>(.+)</title>', webpage, 'playlist title', default=None)
|
||||
or self._og_search_title(webpage, name='playlist title', default=None)
|
||||
or self._html_search_meta('title', webpage, display_name='playlist title'))
|
||||
if playlist_title:
|
||||
playlist_title = re.sub(r'(.+)\s*-\s*BBC.*?$', r'\1', playlist_title).strip()
|
||||
playlist_title = re.sub(r'^(BBC.*?\s*-\s*)?(.+)(?(1)|\s*-\s*BBC.*?)$', r'\2', playlist_title).strip()
|
||||
|
||||
playlist_description = json_ld_info.get(
|
||||
'description') or self._og_search_description(webpage, default=None)
|
||||
playlist_description = json_ld_info.get('description')
|
||||
if not playlist_description:
|
||||
playlist_description = (self._og_search_description(webpage, default=None)
|
||||
or self._html_search_meta('description', webpage, default=None))
|
||||
if playlist_description:
|
||||
playlist_description = playlist_description.strip()
|
||||
|
||||
timestamp = json_ld_info.get('timestamp')
|
||||
if not timestamp:
|
||||
timestamp = parse_iso8601(self._search_regex(
|
||||
[r'<meta[^>]+property="article:published_time"[^>]+content="([^"]+)"',
|
||||
|
@ -903,6 +1088,7 @@ class BBCIE(BBCCoUkIE):
|
|||
|
||||
# article with multiple videos embedded with playlist.sxml (e.g.
|
||||
# http://www.bbc.com/sport/0/football/34475836)
|
||||
# - obsolete?
|
||||
playlists = re.findall(r'<param[^>]+name="playlist"[^>]+value="([^"]+)"', webpage)
|
||||
playlists.extend(re.findall(r'data-media-id="([^"]+/playlist\.sxml)"', webpage))
|
||||
if playlists:
|
||||
|
@ -920,27 +1106,17 @@ class BBCIE(BBCCoUkIE):
|
|||
continue
|
||||
settings = data_playable.get('settings', {})
|
||||
if settings:
|
||||
# data-playable with video vpid in settings.playlistObject.items (e.g.
|
||||
# http://www.bbc.com/news/world-us-canada-34473351)
|
||||
# data-playable with video vpid in settings.playlistObject.items
|
||||
# obsolete? example previously quoted uses __INITIAL_DATA__ now
|
||||
playlist_object = settings.get('playlistObject', {})
|
||||
if playlist_object:
|
||||
items = playlist_object.get('items')
|
||||
if items and isinstance(items, list):
|
||||
title = playlist_object['title']
|
||||
description = playlist_object.get('summary')
|
||||
duration = int_or_none(items[0].get('duration'))
|
||||
programme_id = items[0].get('vpid')
|
||||
formats, subtitles = self._download_media_selector(programme_id)
|
||||
self._sort_formats(formats)
|
||||
entries.append({
|
||||
'id': programme_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
entry = self._extract_from_playlist_object(playlist_object)
|
||||
entry = self._get_playlist_entry(entry)
|
||||
if entry:
|
||||
entry.update({
|
||||
'timestamp': timestamp,
|
||||
'duration': duration,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
})
|
||||
entries.append(entry)
|
||||
else:
|
||||
# data-playable without vpid but with a playlist.sxml URLs
|
||||
# in otherSettings.playlist (e.g.
|
||||
|
@ -970,7 +1146,25 @@ class BBCIE(BBCCoUkIE):
|
|||
if entry:
|
||||
self._sort_formats(entry['formats'])
|
||||
entries.append(entry)
|
||||
|
||||
else:
|
||||
# embed video with playerSettings, eg
|
||||
# https://www.bbc.com/news/av/embed/p07xmg48/50670843
|
||||
settings = self._html_search_regex(
|
||||
r'<script\b[^>]+>.+\.playerSettings\s*=\s*(?P<json>\{.*\})\s*(?:,\s*function\s*\(\s*\)\s*\{\s*["\']use strict.+\(\s*\)\s*)?</script\b',
|
||||
webpage, 'player settings', default='{}', group='json')
|
||||
settings = self._parse_json(settings, playlist_id, transform_source=js_to_json, fatal=False)
|
||||
if settings:
|
||||
playlist_object = settings.get('playlistObject', {})
|
||||
if playlist_object:
|
||||
entry = self._extract_from_playlist_object(playlist_object)
|
||||
entry = self._get_playlist_entry(entry)
|
||||
if entry:
|
||||
thumbnail = playlist_object.get('holdingImageURL')
|
||||
entry.update({
|
||||
'timestamp': timestamp,
|
||||
'thumbnail': thumbnail.replace('$recipe', 'raw') if thumbnail else None,
|
||||
})
|
||||
entries.append(entry)
|
||||
if entries:
|
||||
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
|
||||
|
||||
|
@ -1012,56 +1206,96 @@ class BBCIE(BBCCoUkIE):
|
|||
}
|
||||
|
||||
# bbc reel (e.g. https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness)
|
||||
# playlist pages have a current video (first in the list), plus links to the other videos
|
||||
initial_data = self._parse_json(self._html_search_regex(
|
||||
r'<script[^>]+id=(["\'])initial-data\1[^>]+data-json=(["\'])(?P<json>(?:(?!\2).)+)',
|
||||
webpage, 'initial data', default='{}', group='json'), playlist_id, fatal=False)
|
||||
if initial_data:
|
||||
init_data = try_get(
|
||||
initial_data, lambda x: x['initData']['items'][0], dict) or {}
|
||||
smp_data = init_data.get('smpData') or {}
|
||||
clip_data = try_get(smp_data, lambda x: x['items'][0], dict) or {}
|
||||
version_id = clip_data.get('versionID')
|
||||
if version_id:
|
||||
title = smp_data['title']
|
||||
formats, subtitles = self._download_media_selector(version_id)
|
||||
self._sort_formats(formats)
|
||||
image_url = smp_data.get('holdingImageURL')
|
||||
display_date = init_data.get('displayDate')
|
||||
topic_title = init_data.get('topicTitle')
|
||||
init_items = try_get(
|
||||
initial_data, lambda x: x['initData']['items'], list) or []
|
||||
# Reel pages may have an active video and a playlist as well
|
||||
# If the URL implies playlist, let --no-playlist select the video
|
||||
# If the URL implies video (includes a PID string other than 'playlist'),
|
||||
# let --yes-playlist select the playlist
|
||||
# If the URL has parameter vpid set in the query string, treat it as
|
||||
# implying a video and find that exact versionID in the playlist
|
||||
noplaylist = self._downloader.params.get('noplaylist')
|
||||
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
|
||||
vpid = try_get(qs, lambda x: x['vpid'][0], compat_str)
|
||||
single_pid = vpid or \
|
||||
re.search(r'[/=](!playlist\b)%s\b' % self._ID_REGEX, url)
|
||||
if len(init_items) > 1:
|
||||
if noplaylist and not single_pid:
|
||||
self.to_screen('Downloading single video because of --no-playlist')
|
||||
elif noplaylist == False and single_pid:
|
||||
self.to_screen('Downloading playlist because of --yes-playlist')
|
||||
if noplaylist is None:
|
||||
noplaylist = single_pid
|
||||
elif vpid and not noplaylist:
|
||||
vpid = None
|
||||
for item in init_items:
|
||||
smp_data = try_get(item, lambda x: x['smpData'])
|
||||
if not smp_data:
|
||||
continue
|
||||
entry = None
|
||||
clip_data = try_get(smp_data, lambda x: x['items'][0], dict) or {}
|
||||
version_id = clip_data.get('versionID')
|
||||
if version_id:
|
||||
if vpid and vpid != version_id:
|
||||
continue
|
||||
title = smp_data['title']
|
||||
formats, subtitles = self._download_media_selector(version_id)
|
||||
self._sort_formats(formats)
|
||||
image_url = smp_data.get('holdingImageURL')
|
||||
display_date = item.get('displayDate')
|
||||
topic_title = item.get('topicTitle')
|
||||
|
||||
return {
|
||||
'id': version_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'alt_title': init_data.get('shortTitle'),
|
||||
'thumbnail': image_url.replace('$recipe', 'raw') if image_url else None,
|
||||
'description': smp_data.get('summary') or init_data.get('shortSummary'),
|
||||
'upload_date': display_date.replace('-', '') if display_date else None,
|
||||
'subtitles': subtitles,
|
||||
'duration': int_or_none(clip_data.get('duration')),
|
||||
'categories': [topic_title] if topic_title else None,
|
||||
}
|
||||
entry = {
|
||||
'id': version_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'alt_title': item.get('shortTitle'),
|
||||
'thumbnail': image_url.replace('$recipe', 'raw') if image_url else None,
|
||||
'description': smp_data.get('summary') or item.get('shortSummary'),
|
||||
'upload_date': display_date.replace('-', '') if display_date else None,
|
||||
'subtitles': subtitles,
|
||||
'duration': int_or_none(clip_data.get('duration')),
|
||||
'categories': [topic_title] if topic_title else None,
|
||||
}
|
||||
if entry:
|
||||
if noplaylist:
|
||||
return entry
|
||||
entries.append(entry)
|
||||
|
||||
if entries:
|
||||
initial_data = initial_data['initData']
|
||||
title = initial_data.get('title')
|
||||
description = initial_data.get('summary')
|
||||
return self.playlist_result(entries, playlist_id, title, description)
|
||||
|
||||
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
|
||||
# There are several setPayload calls may be present but the video
|
||||
# seems to be always related to the first one
|
||||
# Several setPayload calls may be present so pick the one with 'asset-data'
|
||||
# or 'page-component-data'
|
||||
# For Weather, use 'asset-with-media'
|
||||
# For Bitesize, use 'guide-data'
|
||||
morph_payload = self._parse_json(
|
||||
self._search_regex(
|
||||
r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
|
||||
r'Morph\.setPayload\s*\([^,]+-(?:asset-data|page-component-data|asset-with-media|guide-data)/[^,]+,\s*(\{.+[]}]\s*})\s*\)(?:\s*;\s*}\s*\))?\s*;\s*</script',
|
||||
webpage, 'morph payload', default='{}'),
|
||||
playlist_id, fatal=False)
|
||||
if morph_payload:
|
||||
# try for components
|
||||
components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
|
||||
for component in components:
|
||||
if not isinstance(component, dict):
|
||||
continue
|
||||
lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
|
||||
if not lead_media:
|
||||
lead_media = try_get(component, lambda x: x['props']['supportingMedia'][0], dict)
|
||||
if not lead_media:
|
||||
continue
|
||||
identifiers = lead_media.get('identifiers')
|
||||
if not identifiers or not isinstance(identifiers, dict):
|
||||
continue
|
||||
programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
|
||||
programme_id = dict_get(identifiers, ('vpid', 'playablePid'))
|
||||
if not programme_id:
|
||||
continue
|
||||
title = lead_media.get('title') or self._og_search_title(webpage)
|
||||
|
@ -1085,6 +1319,233 @@ class BBCIE(BBCCoUkIE):
|
|||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
# another type (asset-data/)
|
||||
body_media = try_get(morph_payload, lambda x: x['body'], dict) or {}
|
||||
# check for variant but similar format found with Weather
|
||||
# dict.values() is a view in Python 3, a list in Python 2
|
||||
primary_video = try_get(body_media, lambda x: list(x['media']['videos']['primary'].values())[0], dict)
|
||||
if primary_video:
|
||||
body_media.update(primary_video)
|
||||
programme_id = body_media.get('versionPid')
|
||||
else:
|
||||
# Bite-size
|
||||
page_children = try_get(body_media, lambda x: x['chapterData']['page']['children'], list) or []
|
||||
|
||||
def chdata_extract_media(children):
|
||||
for child in children:
|
||||
type = try_get(child, lambda x: x['type'], compat_str)
|
||||
if type != 'element':
|
||||
continue
|
||||
if child.get('name') == 'media':
|
||||
return try_get(child, lambda x: x['attributes'], dict)
|
||||
media = chdata_extract_media(child.get('children'))
|
||||
if media:
|
||||
return media
|
||||
|
||||
media = chdata_extract_media(page_children)
|
||||
if media:
|
||||
programme_id = media.get('vpid')
|
||||
if programme_id:
|
||||
body_media.update(media)
|
||||
if not programme_id:
|
||||
body_media.update(body_media.get('media') or {})
|
||||
programme_id = body_media.get('pid')
|
||||
if programme_id:
|
||||
title = (body_media.get('title')
|
||||
or self._og_search_title(webpage)
|
||||
or self._html_search_meta('title', webpage))
|
||||
formats, subtitles = self._download_media_selector(programme_id)
|
||||
self._sort_formats(formats)
|
||||
image_url = dict_get(body_media, ('holdingImageUrl', 'holdingImage'))
|
||||
return {
|
||||
'id': programme_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
'thumbnail': re.sub(r'(\{width}xn|\$recipe)', 'raw', image_url) if image_url else None,
|
||||
'duration': parse_duration(dict_get(body_media, ('duration', 'durationSeconds'))),
|
||||
'description': (try_get(body_media, lambda x: x['promos']['summary'], compat_str)
|
||||
or dict_get(body_media, ('summary', 'shortSynopsis'))
|
||||
or self._html_search_meta('description', webpage)),
|
||||
'timestamp': parse_iso8601(dict_get(body_media, ('dateTime', 'lastUpdated', 'lastModified'))),
|
||||
}
|
||||
|
||||
# morph-based playlist (replaces playlist.sxml)
|
||||
# a JS setPayload call with arg1 containing the playlist_id has JSON in arg2;
|
||||
# deeply nested within it is our target string containing more JSON ...
|
||||
morph_payload = self._parse_json(
|
||||
self._search_regex(
|
||||
r'Morph\.setPayload\s*\([^,]+%s%s%s[^,]+,\s*(\{.+[]}]\s*})\s*\)\s*;' % ('%2F', playlist_id, '%22%2CisStory%3Atrue'),
|
||||
webpage, 'morph playlist payload', default='{}'),
|
||||
playlist_id, fatal=False)
|
||||
if morph_payload:
|
||||
# looking for a string containing a JSON list
|
||||
components = try_get(morph_payload, lambda x: x['body']['content']['article']['body'], compat_str) or '[]'
|
||||
components = self._parse_json(components, playlist_id, fatal=False) or []
|
||||
for component in components:
|
||||
if component.get('name') != 'video':
|
||||
continue
|
||||
component = component.get('videoData') or {}
|
||||
programme_id = dict_get(component, ('vpid', 'pid'))
|
||||
if programme_id:
|
||||
formats, subtitles = self._download_media_selector(programme_id)
|
||||
if not formats:
|
||||
continue
|
||||
self._sort_formats(formats)
|
||||
entries.append({
|
||||
'id': programme_id,
|
||||
'title': component.get('title', 'Unnamed clip %s' % programme_id),
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
'thumbnail': dict_get(component, ('iChefImage', 'image')),
|
||||
'duration': parse_duration(component.get('duration')),
|
||||
'description': component.get('caption'),
|
||||
})
|
||||
if entries:
|
||||
return self.playlist_result(
|
||||
entries,
|
||||
playlist_id,
|
||||
playlist_title,
|
||||
playlist_description)
|
||||
|
||||
body_media = try_get(morph_payload, lambda x: x['body'], dict) or {}
|
||||
body_media.update(body_media.get('media') or {})
|
||||
programme_id = body_media.get('pid')
|
||||
if programme_id:
|
||||
title = (body_media.get('title')
|
||||
or self._og_search_title(webpage)
|
||||
or self._html_search_meta('title', webpage))
|
||||
formats, subtitles = self._download_media_selector(programme_id)
|
||||
self._sort_formats(formats)
|
||||
image_url = body_media.get('holdingImageUrl')
|
||||
return {
|
||||
'id': programme_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
'thumbnail': image_url.replace('{width}xn', 'raw') if image_url else None,
|
||||
'duration': parse_duration(body_media.get('duration')),
|
||||
'description': (try_get(body_media, lambda x: x['promos']['summary'], str)
|
||||
or self._html_search_meta('description', webpage)),
|
||||
'timestamp': parse_iso8601(body_media.get('dateTime')),
|
||||
}
|
||||
|
||||
# morph-based playlist (replaces playlist.sxml?)
|
||||
# a JS setPayload call with arg1 containg the playlist_id has JSON in arg2;
|
||||
# deeply nested within it is our target string containing more JSON ...
|
||||
morph_payload = self._parse_json(
|
||||
self._search_regex(
|
||||
r'Morph\.setPayload\s*\([^,]+%s%s%s[^,]+,\s*(\{.+[]}]\s*})\s*\)\s*;' % ('%2F', playlist_id, '%22%2CisStory%3Atrue'),
|
||||
webpage, 'morph playlist payload', default='{}'),
|
||||
playlist_id, fatal=False)
|
||||
if morph_payload:
|
||||
# looking for a string containing a JSON list
|
||||
components = try_get(morph_payload, lambda x: x['body']['content']['article']['body'], compat_str) or '[]'
|
||||
components = self._parse_json(components, playlist_id, fatal=False) or []
|
||||
for component in components:
|
||||
if component.get('name') != 'video':
|
||||
continue
|
||||
component = component.get('videoData') or {}
|
||||
programme_id = dict_get(component, ('vpid', 'pid'))
|
||||
if programme_id:
|
||||
formats, subtitles = self._download_media_selector(programme_id)
|
||||
if not formats:
|
||||
continue
|
||||
self._sort_formats(formats)
|
||||
entries.append({
|
||||
'id': programme_id,
|
||||
'title': component.get('title', 'Unnamed clip %s' % programme_id),
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
'thumbnail': dict_get(component, ('iChefImage', 'image')),
|
||||
'duration': parse_duration(component.get('duration')),
|
||||
'description': component.get('caption'),
|
||||
})
|
||||
if entries:
|
||||
return self.playlist_result(
|
||||
entries,
|
||||
playlist_id,
|
||||
playlist_title,
|
||||
playlist_description)
|
||||
|
||||
# simorgh-based playlist (see https://github.com/bbc/simorgh)
|
||||
# JSON assigned to window.SIMORGH_DATA in a <script> element
|
||||
simorgh_data = self._parse_json(
|
||||
self._search_regex(
|
||||
r'window\.SIMORGH_DATA\s*=\s*(\{[^<]+})\s*</',
|
||||
webpage, 'simorgh playlist', default='{}'),
|
||||
playlist_id, fatal=False)
|
||||
# legacy media, video in promo object (eg, http://www.bbc.com/mundo/video_fotos/2015/06/150619_video_honduras_militares_hospitales_corrupcion_aw)
|
||||
playlist = try_get(simorgh_data, lambda x: x['pageData']['promo']['media']['playlist']) or []
|
||||
if playlist:
|
||||
media = simorgh_data['pageData']['promo']
|
||||
if media['media'].get('format') == 'video':
|
||||
media.update(media['media'])
|
||||
formats = []
|
||||
keys = {'url', 'format', 'format_id', 'language', 'quality', 'tbr', 'resolution'}
|
||||
for format in playlist:
|
||||
if not (format.get('url') and format.get('format')):
|
||||
continue
|
||||
bitrate = format.pop('bitrate')
|
||||
if bitrate:
|
||||
bitrate = re.sub(r'000\s*$', 'kbps', bitrate)
|
||||
format['tbr'] = parse_bitrate(bitrate)
|
||||
format['language'] = media.get('language')
|
||||
# format id: penultimate item from the url split on _ and .
|
||||
(fmt,) = re.split('[_.]', format['url'])[-2:][:1]
|
||||
format['format_id'] = '%s_%s' % (format['format'], fmt)
|
||||
if not format.get('resolution'):
|
||||
format['resolution'] = fmt
|
||||
format['quality'] = -1
|
||||
formats.append(dict((k, format[k]) for k in keys))
|
||||
self._sort_formats(formats)
|
||||
return {
|
||||
'id': media.get('id'),
|
||||
'title': (dict_get(media.get('headlines'),
|
||||
('shortHeadline', 'headline'))
|
||||
or playlist_title),
|
||||
'description': media.get('summary') or playlist_description,
|
||||
'formats': formats,
|
||||
'subtitles': None,
|
||||
'thumbnail': try_get(media, lambda x: x['image']['href']),
|
||||
'timestamp': int_or_none(media.get('timestamp'), scale=1000)
|
||||
}
|
||||
|
||||
# general case: media nested in content object
|
||||
# test: https://www.bbc.co.uk/scotland/articles/cm49v4x1r9lo
|
||||
if simorgh_data:
|
||||
|
||||
def extract_media_from_simorgh(model):
|
||||
if not isinstance(model, dict):
|
||||
return
|
||||
for block in model.get('blocks') or {}:
|
||||
if block.get('type') == 'aresMediaMetadata':
|
||||
vpid = try_get(block, lambda x: x['model']['versions'][0]['versionId'])
|
||||
if vpid:
|
||||
formats, subtitles = self._download_media_selector(vpid)
|
||||
self._sort_formats(formats)
|
||||
model = block['model']
|
||||
version = model['versions'][0]
|
||||
thumbnail = model.get('imageUrl')
|
||||
return {
|
||||
'id': vpid,
|
||||
'title': model.get('title') or 'unnamed clip',
|
||||
'description': dict_get(model.get('synopses') or {}, ('long', 'medium', 'short')),
|
||||
'duration': (int_or_none(version.get('duration'))
|
||||
or parse_duration(version.get('durationISO8601'))),
|
||||
'timestamp': version.get('availableFrom'),
|
||||
'thumbnail': urljoin(url, thumbnail.replace('$recipe', 'raw')) if thumbnail else None,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
else:
|
||||
entry = extract_media_from_simorgh(block.get('model'))
|
||||
if entry:
|
||||
return entry
|
||||
|
||||
playlist = extract_media_from_simorgh(try_get(simorgh_data, lambda x: x['pageData']['content']['model']))
|
||||
if playlist:
|
||||
return playlist
|
||||
|
||||
preload_state = self._parse_json(self._search_regex(
|
||||
r'window\.__PRELOADED_STATE__\s*=\s*({.+?});', webpage,
|
||||
|
@ -1162,6 +1623,7 @@ class BBCIE(BBCCoUkIE):
|
|||
return self.playlist_result(
|
||||
entries, playlist_id, playlist_title, playlist_description)
|
||||
|
||||
# eg, http://www.bbc.com/news/world-us-canada-34473351
|
||||
initial_data = self._parse_json(self._search_regex(
|
||||
r'window\.__INITIAL_DATA__\s*=\s*({.+?});', webpage,
|
||||
'preload state', default='{}'), playlist_id, fatal=False)
|
||||
|
@ -1176,7 +1638,11 @@ class BBCIE(BBCCoUkIE):
|
|||
continue
|
||||
formats, subtitles = self._download_media_selector(item_id)
|
||||
self._sort_formats(formats)
|
||||
item_desc = None
|
||||
# make description by combining any .model.text strings in the .summary.blocks list
|
||||
item_desc = ('\n\n'.join(filter(lambda x: x is not None,
|
||||
map(lambda blk: try_get(blk, lambda x: x['model']['text'], compat_str),
|
||||
try_get(media, lambda x: x['summary']['blocks'], list) or [])))
|
||||
or None)
|
||||
blocks = try_get(media, lambda x: x['summary']['blocks'], list)
|
||||
if blocks:
|
||||
summary = []
|
||||
|
@ -1352,7 +1818,7 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
|
|||
if single_page:
|
||||
return
|
||||
next_page = self._search_regex(
|
||||
r'<li[^>]+class=(["\'])pagination_+next\1[^>]*><a[^>]+href=(["\'])(?P<url>(?:(?!\2).)+)\2',
|
||||
r'<li[^>]+class=(["\'])pagination_+next\1[^>]*>\s*<a[^>]+href=(["\'])(?P<url>(?:(?!\2).)+)\2',
|
||||
webpage, 'next page url', default=None, group='url')
|
||||
if not next_page:
|
||||
break
|
||||
|
@ -1360,6 +1826,13 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
|
|||
compat_urlparse.urljoin(url, next_page), playlist_id,
|
||||
'Downloading page %d' % page_num, page_num)
|
||||
|
||||
def _extract_title_and_description(self, webpage):
|
||||
title = (self._og_search_title(webpage, default=None)
|
||||
or self._html_search_meta('title', webpage, display_name='playlist title', default='Unnamed playlist'))
|
||||
description = (self._og_search_description(webpage, default=None)
|
||||
or self._html_search_meta('description', webpage, default=None))
|
||||
return title, description
|
||||
|
||||
def _real_extract(self, url):
|
||||
playlist_id = self._match_id(url)
|
||||
|
||||
|
@ -1416,7 +1889,7 @@ class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
|
|||
per_page = 36 if page else self._PAGE_SIZE
|
||||
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
|
||||
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
|
||||
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
|
||||
playlist_data = self._get_playlist_data(self._call_api(pid, 1) or {})
|
||||
return self.playlist_result(
|
||||
entries, pid, self._get_playlist_title(playlist_data),
|
||||
self._get_description(playlist_data))
|
||||
|
@ -1481,7 +1954,7 @@ class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
|
|||
|
||||
@staticmethod
|
||||
def _get_elements(data):
|
||||
return data['entities']['results']
|
||||
return try_get(data, lambda x: x['entities']['results'], list)
|
||||
|
||||
@staticmethod
|
||||
def _get_episode(element):
|
||||
|
@ -1553,7 +2026,7 @@ class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
|
|||
|
||||
@staticmethod
|
||||
def _get_elements(data):
|
||||
return data['elements']
|
||||
return try_get(data, lambda x: x['elements'], list)
|
||||
|
||||
@staticmethod
|
||||
def _get_episode(element):
|
||||
|
@ -1574,6 +2047,14 @@ class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
|
|||
def _get_playlist_title(self, data):
|
||||
return data.get('title')
|
||||
|
||||
def _extract_title_and_description(self, webpage):
|
||||
title, description = super(BBCCoUkIPlayerGroupIE, self)._extract_title_and_description(webpage)
|
||||
title = self._html_search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', default=title)
|
||||
description = self._html_search_regex(
|
||||
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
|
||||
webpage, 'description', group='value', default=description)
|
||||
return title, description
|
||||
|
||||
|
||||
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
|
||||
IE_NAME = 'bbc.co.uk:playlist'
|
||||
|
@ -1616,8 +2097,3 @@ class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
|
|||
'url': 'http://www.bbc.co.uk/programmes/b055jkys/episodes/player',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _extract_title_and_description(self, webpage):
|
||||
title = self._og_search_title(webpage, fatal=False)
|
||||
description = self._og_search_description(webpage)
|
||||
return title, description
|
||||
|
|
|
@ -1520,6 +1520,7 @@ from .wdr import (
|
|||
WDRElefantIE,
|
||||
WDRMobileIE,
|
||||
)
|
||||
from .webarchive import WebArchiveIE
|
||||
from .webcaster import (
|
||||
WebcasterIE,
|
||||
WebcasterFeedIE,
|
||||
|
@ -1610,7 +1611,7 @@ from .youtube import (
|
|||
YoutubeRecommendedIE,
|
||||
YoutubeSearchDateIE,
|
||||
YoutubeSearchIE,
|
||||
#YoutubeSearchURLIE,
|
||||
YoutubeSearchURLIE,
|
||||
YoutubeSubscriptionsIE,
|
||||
YoutubeTruncatedIDIE,
|
||||
YoutubeTruncatedURLIE,
|
||||
|
|
|
@ -15,6 +15,7 @@ from ..utils import (
|
|||
merge_dicts,
|
||||
parse_duration,
|
||||
smuggle_url,
|
||||
try_get,
|
||||
url_or_none,
|
||||
)
|
||||
|
||||
|
@ -23,15 +24,20 @@ class ITVIE(InfoExtractor):
|
|||
_VALID_URL = r'https?://(?:www\.)?itv\.com/hub/[^/]+/(?P<id>[0-9a-zA-Z]+)'
|
||||
_GEO_COUNTRIES = ['GB']
|
||||
_TESTS = [{
|
||||
'url': 'https://www.itv.com/hub/liar/2a4547a0012',
|
||||
'url': 'https://www.itv.com/hub/the-durrells/2a4156a0001',
|
||||
'info_dict': {
|
||||
'id': '2a4547a0012',
|
||||
'id': '2a4156a0001',
|
||||
'ext': 'mp4',
|
||||
'title': 'Liar - Series 2 - Episode 6',
|
||||
'description': 'md5:d0f91536569dec79ea184f0a44cca089',
|
||||
'series': 'Liar',
|
||||
'season_number': 2,
|
||||
'episode_number': 6,
|
||||
'title': 'The Durrells - Series 1 - Episode 1',
|
||||
'description': 'md5:43ae58e27aa91720fc933a68a37e9e95',
|
||||
'series': 'The Durrells',
|
||||
'season_number': 1,
|
||||
'episode_number': 1,
|
||||
'subtitles': {
|
||||
'en': [
|
||||
{'url': 'https://itvpnpsubtitles.content.itv.com/2-4156-0001-003/Subtitles/3/WebVTT-OUT-OF-BAND/2-4156-0001-003_Series1590486890_TX000000.vtt'}
|
||||
]
|
||||
},
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
|
@ -87,13 +93,13 @@ class ITVIE(InfoExtractor):
|
|||
},
|
||||
'variantAvailability': {
|
||||
'featureset': {
|
||||
'min': ['hls', 'aes', 'outband-webvtt'],
|
||||
'max': ['hls', 'aes', 'outband-webvtt']
|
||||
'min': ['hls', 'aes'],
|
||||
'max': ['hls', 'aes']
|
||||
},
|
||||
'platformTag': 'dotcom'
|
||||
'platformTag': 'mobile'
|
||||
}
|
||||
}).encode(), headers=headers)
|
||||
video_data = ios_playlist['Playlist']['Video']
|
||||
video_data = try_get(ios_playlist, lambda x: x['Playlist']['Video'], dict) or {}
|
||||
ios_base_url = video_data.get('Base')
|
||||
|
||||
formats = []
|
||||
|
@ -114,8 +120,36 @@ class ITVIE(InfoExtractor):
|
|||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
subs_playlist = self._download_json(
|
||||
ios_playlist_url, video_id, data=json.dumps({
|
||||
'user': {
|
||||
'itvUserId': '',
|
||||
'entitlements': [],
|
||||
'token': ''
|
||||
},
|
||||
'device': {
|
||||
'manufacturer': 'Safari',
|
||||
'model': '5',
|
||||
'os': {
|
||||
'name': 'Windows NT',
|
||||
'version': '6.1',
|
||||
'type': 'desktop'
|
||||
}
|
||||
},
|
||||
'client': {
|
||||
'version': '4.1',
|
||||
'id': 'browser'
|
||||
},
|
||||
'variantAvailability': {
|
||||
'featureset': {
|
||||
'min': ['mpeg-dash', 'widevine', 'outband-webvtt'],
|
||||
'max': ['mpeg-dash', 'widevine', 'outband-webvtt']
|
||||
},
|
||||
'platformTag': 'mobile'
|
||||
}
|
||||
}).encode(), headers=headers)
|
||||
subs = try_get(subs_playlist, lambda x: x['Playlist']['Video']['Subtitles'], list) or []
|
||||
subtitles = {}
|
||||
subs = video_data.get('Subtitles') or []
|
||||
for sub in subs:
|
||||
if not isinstance(sub, dict):
|
||||
continue
|
||||
|
|
|
@ -0,0 +1,54 @@
|
|||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
|
||||
class WebArchiveIE(InfoExtractor):
|
||||
_VALID_URL = r'https?:\/\/(?:www\.)?web\.archive\.org\/web\/([0-9]+)\/https?:\/\/(?:www\.)?youtube\.com\/watch\?v=(?P<id>[0-9A-Za-z_-]{1,11})$'
|
||||
_TEST = {
|
||||
'url': 'https://web.archive.org/web/20150415002341/https://www.youtube.com/watch?v=aYAGB11YrSs',
|
||||
'md5': 'ec44dc1177ae37189a8606d4ca1113ae',
|
||||
'info_dict': {
|
||||
'url': 'https://web.archive.org/web/2oe_/http://wayback-fakeurl.archive.org/yt/aYAGB11YrSs',
|
||||
'id': 'aYAGB11YrSs',
|
||||
'ext': 'mp4',
|
||||
'title': 'Team Fortress 2 - Sandviches!',
|
||||
'author': 'Zeurel',
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
# Get video ID and page
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
# Extract title and author
|
||||
title = self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title').strip()
|
||||
author = self._html_search_regex(r'"author":"([a-zA-Z0-9]+)"', webpage, 'author').strip()
|
||||
|
||||
# Parse title
|
||||
if title.endswith(' - YouTube'):
|
||||
title = title[:-10]
|
||||
|
||||
# Use link translator mentioned in https://github.com/ytdl-org/youtube-dl/issues/13655
|
||||
link_stub = "https://web.archive.org/web/2oe_/http://wayback-fakeurl.archive.org/yt/"
|
||||
|
||||
# Extract hash from url
|
||||
hash_idx = url.find("watch?v=") + len("watch?v=")
|
||||
youtube_hash = url[hash_idx:]
|
||||
|
||||
# If there's an ampersand, cut off before it
|
||||
ampersand = youtube_hash.find('&')
|
||||
if ampersand != -1:
|
||||
youtube_hash = youtube_hash[:ampersand]
|
||||
|
||||
# Recreate the fixed pattern url and return
|
||||
reconstructed_url = link_stub + youtube_hash
|
||||
return {
|
||||
'url': reconstructed_url,
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'author': author,
|
||||
'ext': "mp4"
|
||||
}
|
|
@ -7,6 +7,7 @@ import json
|
|||
import os.path
|
||||
import random
|
||||
import re
|
||||
import string
|
||||
import traceback
|
||||
|
||||
from .common import InfoExtractor, SearchInfoExtractor
|
||||
|
@ -1478,8 +1479,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||
video_id = self._match_id(url)
|
||||
base_url = self.http_scheme() + '//www.youtube.com/'
|
||||
webpage_url = base_url + 'watch?v=' + video_id
|
||||
|
||||
# setting a random cookie helps to avoid http 429 errors
|
||||
rnd1 = ''.join(random.choice(string.ascii_letters+string.digits) for i in range(11))
|
||||
rnd2 = ''.join(random.choice(string.ascii_letters+string.digits) for i in range(11))
|
||||
cookie = 'CONSENT=YES+cb.20210608-18-p0.de+FX+696; GPS=1; YSC='+rnd1+'; VISITOR_INFO1_LIVE='+rnd2+'; PREF=tz=Europe.London'
|
||||
webpage = self._download_webpage(
|
||||
webpage_url + '&bpctr=9999999999&has_verified=1', video_id, fatal=False)
|
||||
webpage_url + '&bpctr=9999999999&has_verified=1', video_id, fatal=False, headers={'Cookie':cookie})
|
||||
|
||||
player_response = None
|
||||
if webpage:
|
||||
|
@ -2002,7 +2008,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||
(?:
|
||||
(?:channel|c|user|feed|hashtag)/|
|
||||
(?:playlist|watch)\?.*?\blist=|
|
||||
(?!(?:watch|embed|v|e)\b)
|
||||
(?!(?:watch|embed|v|e|results)\b)
|
||||
)
|
||||
(?P<id>[^/?\#&]+)
|
||||
'''
|
||||
|
@ -3079,11 +3085,10 @@ class YoutubeSearchDateIE(YoutubeSearchIE):
|
|||
_SEARCH_PARAMS = 'CAI%3D'
|
||||
|
||||
|
||||
r"""
|
||||
class YoutubeSearchURLIE(YoutubeSearchIE):
|
||||
IE_DESC = 'YouTube.com search URLs'
|
||||
IE_NAME = 'youtube:search_url'
|
||||
_VALID_URL = r'https?://(?:www\.)?youtube\.com/results\?(.*?&)?(?:search_query|q)=(?P<query>[^&]+)(?:[&]|$)'
|
||||
_VALID_URL = r'https?://(?:www\.)?youtube\.com/results\?(.*?&)?(?:search_query|q)=(?:[^&]+)(?:[&]|$)'
|
||||
_TESTS = [{
|
||||
'url': 'https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video',
|
||||
'playlist_mincount': 5,
|
||||
|
@ -3095,9 +3100,20 @@ class YoutubeSearchURLIE(YoutubeSearchIE):
|
|||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
def _make_valid_url(cls):
|
||||
return cls._VALID_URL
|
||||
|
||||
def _real_extract(self, url):
|
||||
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
|
||||
query = (qs.get('search_query') or qs.get('q'))[0]
|
||||
self._SEARCH_PARAMS = qs.get('sp', ('',))[0]
|
||||
return self._get_n_results(query, self._MAX_RESULTS)
|
||||
r"""
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
query = compat_urllib_parse_unquote_plus(mobj.group('query'))
|
||||
# url_result(url, ie=None, video_id=None, video_title=None)
|
||||
#_SEARCH_KEY='ytsearch'+ ()
|
||||
webpage = self._download_webpage(url, query)
|
||||
return self.playlist_result(self._process_page(webpage), playlist_title=query)
|
||||
"""
|
||||
|
|
|
@ -173,7 +173,7 @@ def parseOpts(overrideArguments=None):
|
|||
'--ignore-config',
|
||||
action='store_true',
|
||||
help='Do not read configuration files. '
|
||||
'When given in the global configuration file /etc/youtube-dl.conf: '
|
||||
'When given in the global configuration file /mod/etc/youtube-dl.conf: '
|
||||
'Do not read the user configuration in ~/.config/youtube-dl/config '
|
||||
'(%APPDATA%/youtube-dl/config.txt on Windows)')
|
||||
general.add_option(
|
||||
|
@ -330,11 +330,11 @@ def parseOpts(overrideArguments=None):
|
|||
))
|
||||
selection.add_option(
|
||||
'--no-playlist',
|
||||
action='store_true', dest='noplaylist', default=False,
|
||||
action='store_true', dest='noplaylist', default=None,
|
||||
help='Download only the video, if the URL refers to a video and a playlist.')
|
||||
selection.add_option(
|
||||
'--yes-playlist',
|
||||
action='store_false', dest='noplaylist', default=False,
|
||||
action='store_false', dest='noplaylist', default=None,
|
||||
help='Download the playlist, if the URL refers to a video and a playlist.')
|
||||
selection.add_option(
|
||||
'--age-limit',
|
||||
|
@ -903,7 +903,7 @@ def parseOpts(overrideArguments=None):
|
|||
elif '--ignore-config' in command_line_conf:
|
||||
pass
|
||||
else:
|
||||
system_conf = _readOptions('/etc/youtube-dl.conf')
|
||||
system_conf = _readOptions('/mod/etc/youtube-dl.conf')
|
||||
if '--ignore-config' not in system_conf:
|
||||
user_conf = _readUserConf()
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
#!/usr/bin/env python
|
||||
#!/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
from __future__ import unicode_literals
|
||||
|
@ -2153,9 +2153,36 @@ def sanitize_url(url):
|
|||
return re.sub(mistake, fixup, url)
|
||||
return url
|
||||
|
||||
def extract_user_pass(url):
|
||||
parts = compat_urlparse.urlsplit(url)
|
||||
username = parts.username
|
||||
password = parts.password
|
||||
if username is not None:
|
||||
if password is None:
|
||||
password = ''
|
||||
netloc = parts.hostname
|
||||
if parts.port is not None:
|
||||
netloc = parts.hostname + ':' + parts.port
|
||||
parts = parts._replace(netloc=netloc)
|
||||
url = compat_urlparse.urlunsplit(parts)
|
||||
return url, username, password
|
||||
|
||||
|
||||
def sanitized_Request(url, *args, **kwargs):
|
||||
return compat_urllib_request.Request(sanitize_url(url), *args, **kwargs)
|
||||
url = sanitize_url(url)
|
||||
url, username, password = extract_user_pass(url)
|
||||
if username is not None:
|
||||
# password is not None
|
||||
auth_payload = username + ':' + password
|
||||
auth_payload = base64.b64encode(auth_payload.encode('utf-8')).decode('utf-8')
|
||||
auth_header = 'Basic ' + auth_payload
|
||||
if len(args) >= 2:
|
||||
args[1]['Authorization'] = auth_header
|
||||
else:
|
||||
if 'headers' not in kwargs:
|
||||
kwargs['headers'] = {}
|
||||
kwargs['headers']['Authorization'] = 'Basic ' + auth_payload
|
||||
return compat_urllib_request.Request(url, *args, **kwargs)
|
||||
|
||||
|
||||
def expand_path(s):
|
||||
|
|
|
@ -1,3 +1,3 @@
|
|||
from __future__ import unicode_literals
|
||||
|
||||
__version__ = '2021.06.06'
|
||||
__version__ = '2021.06.06.1'
|
||||
|
|
Loading…
Reference in New Issue