1447 字
7 分钟
下载twitter的内容
很早之前有个twittie还是什么的python的包,可以下载twitter的内容。自从被马斯克收购后,这种类型下载的方式很多被干掉了,因为既缺少可用的软件,又没有什么特别的内容需要,所以就没有继续。前些日子看到一些有趣的内容,又想要收集下来,所以找到这个python的包和windows下的可执行文件
https://github.com/mikf/gallery-dl/releases
在Windows在这个目录配置”%APPDATA%\gallery-dl\config.json“文件,示例如下
{ "extractor": { "twitter": { "videos": true, "images": true } }, "output": { "template": "{author[screen_name]}/{tweet_id}_{num}.{extension}" }}还有要有x.com账户登录后的cookies信息,否则也是不行的。 这个可以在浏览器上安装插件”Get cookies.txt Local” 来实现,装上插件后,访问x.com,然后export cookies 以Netscape的格式,存为x.txt文件
再下来就可以在命令行下执行对应的信息收集了
比如:
gallery-dl --cookies x.txt https://x.com/account由于那个json文件没有设置存放的目标文件夹,所以这样的结果就是下载下来的文件默认都放在C:\Users\your_account_name\gallery-dl\twitter这个文件夹下面,然后以不同的下载目标账户来进行区分。
gallery-dl --helpUsage: gallery-dl [OPTIONS] URL [URL...]
General Options: -h, --help Print this help message and exit --version Print program version and exit -f, --filename FORMAT Filename format string for downloaded files ('/O' for "original" filenames) -d, --destination PATH Target location for file downloads -D, --directory PATH Exact location for file downloads -X, --extractors PATH Load external extractors from PATH -a, --user-agent UA User-Agent request header --clear-cache MODULE Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything) --compat Restore legacy 'category' names
Update Options: -U, --update-check Check if a newer version is available
Input Options: -i, --input-file FILE Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified -I, --input-file-comment FILE Download URLs found in FILE. Comment them out after they were downloaded successfully. -x, --input-file-delete FILE Download URLs found in FILE. Delete them after they were downloaded successfully. --no-input Do not prompt for passwords/tokens
Output Options: -q, --quiet Activate quiet mode -w, --warning Print only warnings and errors -v, --verbose Print various debugging information -g, --get-urls Print URLs instead of downloading -G, --resolve-urls Print URLs instead of downloading; resolve intermediary URLs -j, --dump-json Print JSON information -J, --resolve-json Print JSON information; resolve intermediary URLs -s, --simulate Simulate data extraction; do not download anything -E, --extractor-info Print extractor defaults and settings -K, --list-keywords Print a list of available keywords and example values for the given URLs -e, --error-file FILE Add input URLs which returned an error to FILE -N, --print [EVENT:]FORMAT Write FORMAT during EVENT (default 'prepare') to standard output instead of downloading files. Can be used multiple times. Examples: 'id' or 'post:{md5[:8]}' --Print [EVENT:]FORMAT Like --print, but downloads files as well --print-to-file [EVENT:]FORMAT FILE Append FORMAT during EVENT to FILE instead of downloading files. Can be used multiple times --Print-to-file [EVENT:]FORMAT FILE Like --print-to-file, but downloads files as well --list-modules Print a list of available extractor modules --list-extractors [CATEGORIES] Print a list of extractor classes with description, (sub)category and example URL --write-log FILE Write logging output to FILE --write-unsupported FILE Write URLs, which get emitted by other extractors but cannot be handled, to FILE --write-pages Write downloaded intermediary pages to files in the current directory to debug problems --print-traffic Display sent and read HTTP traffic --no-colors Do not emit ANSI color codes in output
Networking Options: -R, --retries N Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4) --http-timeout SECONDS Timeout for HTTP connections (default: 30.0) --proxy URL Use the specified proxy --source-address IP Client-side IP address to bind to -4, --force-ipv4 Make all connections via IPv4 -6, --force-ipv6 Make all connections via IPv6 --no-check-certificate Disable HTTPS certificate validation
Downloader Options: -r, --limit-rate RATE Maximum download rate (e.g. 500k, 2.5M, or 800k-2M) --chunk-size SIZE Size of in-memory data chunks (default: 32k) --sleep SECONDS Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5) --sleep-request SECONDS Number of seconds to wait between HTTP requests during data extraction --sleep-429 SECONDS Number of seconds to wait when receiving a '429 Too Many Requests' response --sleep-extractor SECONDS Number of seconds to wait before starting data extraction for an input URL --no-part Do not use .part files --no-skip Do not skip downloads; overwrite existing files --no-mtime Do not set file modification times according to Last-Modified HTTP response headers --no-download Do not download any files
Configuration Options: -o, --option KEY=VALUE Additional options. Example: -o browser=firefox -c, --config FILE Additional configuration files --config-yaml FILE Additional configuration files in YAML format --config-toml FILE Additional configuration files in TOML format --config-create Create a basic configuration file --config-status Show configuration file status --config-open Open configuration file in external application --config-ignore Do not read default configuration files
Authentication Options: -u, --username USER Username to login with -p, --password PASS Password belonging to the given username --netrc Enable .netrc authentication data
Cookie Options: -C, --cookies FILE File to load additional cookies from --cookies-export FILE Export session cookies to FILE --cookies-from-browser BROWSER[/DOMAIN][+KEYRING][:PROFILE][::CONTAINER] Name of the browser to load cookies from, with optional domain prefixed with '/', keyring name prefixed with '+', profile prefixed with ':', and container prefixed with '::' ('none' for no container (default), 'all' for all containers)
Selection Options: -A, --abort N[:TARGET] Stop current extractor(s) after N consecutive file downloads were skipped. Specify a TARGET to set how many levels to ascend or to which subcategory to jump to. Examples: '-A 3', '-A 3:2', '-A 3:manga' -T, --terminate N Stop current & parent extractors and proceed with the next input URL after N consecutive file downloads were skipped --filesize-min SIZE Do not download files smaller than SIZE (e.g. 500k or 2.5M) --filesize-max SIZE Do not download files larger than SIZE (e.g. 500k or 2.5M) --download-archive FILE Record successfully downloaded files in FILE and skip downloading any file already in it --range RANGE Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. '5', '8-20', or '1:24:3') --chapter-range RANGE Like '--range', but applies to manga chapters and other delegated URLs --filter EXPR Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by '-K'. Example: --filter "image_width >= 1000 and rating in ('s', 'q')" --chapter-filter EXPR Like '--filter', but applies to manga chapters and other delegated URLs
Post-processing Options: -P, --postprocessor NAME Activate the specified post processor --no-postprocessors Do not run any post processors -O, --postprocessor-option KEY=VALUE Additional post processor options --write-metadata Write metadata to separate JSON files --write-info-json Write gallery metadata to a info.json file --write-tags Write image tags to separate text files --zip Store downloaded files in a ZIP archive --cbz Store downloaded files in a CBZ archive --mtime NAME Set file modification times according to metadata selected by NAME. Examples: 'date' or 'status[date]' --rename FORMAT Rename previously downloaded files from FORMAT to the current filename format --rename-to FORMAT Rename previously downloaded files from the current filename format to FORMAT --ugoira FMT Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are 'webm', 'mp4', 'gif', 'vp8', 'vp9', 'vp9-lossless', 'copy', 'zip'. --exec CMD Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}" --exec-after CMD Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf"虽然上面的config.json里面只写了视频和图片,但是文本内容也是独立的下载下来了。
到目前我没有很仔细的研究这个软件的使用,只是收集信息的目的,这个软件是帮我实现了。
等到真的有必要的时候,我再来更新。
下载twitter的内容
https://dididudu998.github.io/posts/下载twitter的内容/