下载twitter的内容 - 滴滴嘟嘟博客

很早之前有个twittie还是什么的python的包，可以下载twitter的内容。自从被马斯克收购后，这种类型下载的方式很多被干掉了，因为既缺少可用的软件，又没有什么特别的内容需要，所以就没有继续。前些日子看到一些有趣的内容，又想要收集下来，所以找到这个python的包和windows下的可执行文件

https://github.com/mikf/gallery-dl/releases

在Windows在这个目录配置”%APPDATA%\gallery-dl\config.json“文件，示例如下

1
{
2
    "extractor": {
3
        "twitter": {
4
            "videos": true,
5
            "images": true
6
        }
7
    },
8
    "output": {
9
        "template": "{author[screen_name]}/{tweet_id}_{num}.{extension}"
10
    }
11
}

还有要有x.com账户登录后的cookies信息，否则也是不行的。这个可以在浏览器上安装插件”Get cookies.txt Local” 来实现，装上插件后，访问x.com，然后export cookies 以Netscape的格式，存为x.txt文件

再下来就可以在命令行下执行对应的信息收集了

比如：

1
gallery-dl --cookies x.txt https://x.com/account

由于那个json文件没有设置存放的目标文件夹，所以这样的结果就是下载下来的文件默认都放在C:\Users\your_account_name\gallery-dl\twitter这个文件夹下面，然后以不同的下载目标账户来进行区分。

1
gallery-dl --help
2
Usage: gallery-dl [OPTIONS] URL [URL...]
3

4
General Options:
5
  -h, --help                  Print this help message and exit
6
  --version                   Print program version and exit
7
  -f, --filename FORMAT       Filename format string for downloaded files ('/O' for "original" filenames)
8
  -d, --destination PATH      Target location for file downloads
9
  -D, --directory PATH        Exact location for file downloads
10
  -X, --extractors PATH       Load external extractors from PATH
11
  -a, --user-agent UA         User-Agent request header
12
  --clear-cache MODULE        Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)
13
  --compat                    Restore legacy 'category' names
14

15
Update Options:
16
  -U, --update-check          Check if a newer version is available
17

18
Input Options:
19
  -i, --input-file FILE       Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified
20
  -I, --input-file-comment FILE
21
                              Download URLs found in FILE. Comment them out after they were downloaded successfully.
22
  -x, --input-file-delete FILE
23
                              Download URLs found in FILE. Delete them after they were downloaded successfully.
24
  --no-input                  Do not prompt for passwords/tokens
25

26
Output Options:
27
  -q, --quiet                 Activate quiet mode
28
  -w, --warning               Print only warnings and errors
29
  -v, --verbose               Print various debugging information
30
  -g, --get-urls              Print URLs instead of downloading
31
  -G, --resolve-urls          Print URLs instead of downloading; resolve intermediary URLs
32
  -j, --dump-json             Print JSON information
33
  -J, --resolve-json          Print JSON information; resolve intermediary URLs
34
  -s, --simulate              Simulate data extraction; do not download anything
35
  -E, --extractor-info        Print extractor defaults and settings
36
  -K, --list-keywords         Print a list of available keywords and example values for the given URLs
37
  -e, --error-file FILE       Add input URLs which returned an error to FILE
38
  -N, --print [EVENT:]FORMAT  Write FORMAT during EVENT (default 'prepare') to standard output instead of downloading
39
                              files. Can be used multiple times. Examples: 'id' or 'post:{md5[:8]}'
40
  --Print [EVENT:]FORMAT      Like --print, but downloads files as well
41
  --print-to-file [EVENT:]FORMAT FILE
42
                              Append FORMAT during EVENT to FILE instead of downloading files. Can be used multiple
43
                              times
44
  --Print-to-file [EVENT:]FORMAT FILE
45
                              Like --print-to-file, but downloads files as well
46
  --list-modules              Print a list of available extractor modules
47
  --list-extractors [CATEGORIES]
48
                              Print a list of extractor classes with description, (sub)category and example URL
49
  --write-log FILE            Write logging output to FILE
50
  --write-unsupported FILE    Write URLs, which get emitted by other extractors but cannot be handled, to FILE
51
  --write-pages               Write downloaded intermediary pages to files in the current directory to debug problems
52
  --print-traffic             Display sent and read HTTP traffic
53
  --no-colors                 Do not emit ANSI color codes in output
54

55
Networking Options:
56
  -R, --retries N             Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)
57
  --http-timeout SECONDS      Timeout for HTTP connections (default: 30.0)
58
  --proxy URL                 Use the specified proxy
59
  --source-address IP         Client-side IP address to bind to
60
  -4, --force-ipv4            Make all connections via IPv4
61
  -6, --force-ipv6            Make all connections via IPv6
62
  --no-check-certificate      Disable HTTPS certificate validation
63

64
Downloader Options:
65
  -r, --limit-rate RATE       Maximum download rate (e.g. 500k, 2.5M, or 800k-2M)
66
  --chunk-size SIZE           Size of in-memory data chunks (default: 32k)
67
  --sleep SECONDS             Number of seconds to wait before each download. This can be either a constant value or a
68
                              range (e.g. 2.7 or 2.0-3.5)
69
  --sleep-request SECONDS     Number of seconds to wait between HTTP requests during data extraction
70
  --sleep-429 SECONDS         Number of seconds to wait when receiving a '429 Too Many Requests' response
71
  --sleep-extractor SECONDS   Number of seconds to wait before starting data extraction for an input URL
72
  --no-part                   Do not use .part files
73
  --no-skip                   Do not skip downloads; overwrite existing files
74
  --no-mtime                  Do not set file modification times according to Last-Modified HTTP response headers
75
  --no-download               Do not download any files
76

77
Configuration Options:
78
  -o, --option KEY=VALUE      Additional options. Example: -o browser=firefox
79
  -c, --config FILE           Additional configuration files
80
  --config-yaml FILE          Additional configuration files in YAML format
81
  --config-toml FILE          Additional configuration files in TOML format
82
  --config-create             Create a basic configuration file
83
  --config-status             Show configuration file status
84
  --config-open               Open configuration file in external application
85
  --config-ignore             Do not read default configuration files
86

87
Authentication Options:
88
  -u, --username USER         Username to login with
89
  -p, --password PASS         Password belonging to the given username
90
  --netrc                     Enable .netrc authentication data
91

92
Cookie Options:
93
  -C, --cookies FILE          File to load additional cookies from
94
  --cookies-export FILE       Export session cookies to FILE
95
  --cookies-from-browser BROWSER[/DOMAIN][+KEYRING][:PROFILE][::CONTAINER]
96
                              Name of the browser to load cookies from, with optional domain prefixed with '/', keyring
97
                              name prefixed with '+', profile prefixed with ':', and container prefixed with '::'
98
                              ('none' for no container (default), 'all' for all containers)
99

100
Selection Options:
101
  -A, --abort N[:TARGET]      Stop current extractor(s) after N consecutive file downloads were skipped. Specify a
102
                              TARGET to set how many levels to ascend or to which subcategory to jump to. Examples: '-A
103
                              3', '-A 3:2', '-A 3:manga'
104
  -T, --terminate N           Stop current & parent extractors and proceed with the next input URL after N consecutive
105
                              file downloads were skipped
106
  --filesize-min SIZE         Do not download files smaller than SIZE (e.g. 500k or 2.5M)
107
  --filesize-max SIZE         Do not download files larger than SIZE (e.g. 500k or 2.5M)
108
  --download-archive FILE     Record successfully downloaded files in FILE and skip downloading any file already in it
109
  --range RANGE               Index range(s) specifying which files to download. These can be either a constant value,
110
                              range, or slice (e.g. '5', '8-20', or '1:24:3')
111
  --chapter-range RANGE       Like '--range', but applies to manga chapters and other delegated URLs
112
  --filter EXPR               Python expression controlling which files to download. Files for which the expression
113
                              evaluates to False are ignored. Available keys are the filename-specific ones listed by
114
                              '-K'. Example: --filter "image_width >= 1000 and rating in ('s', 'q')"
115
  --chapter-filter EXPR       Like '--filter', but applies to manga chapters and other delegated URLs
116

117
Post-processing Options:
118
  -P, --postprocessor NAME    Activate the specified post processor
119
  --no-postprocessors         Do not run any post processors
120
  -O, --postprocessor-option KEY=VALUE
121
                              Additional post processor options
122
  --write-metadata            Write metadata to separate JSON files
123
  --write-info-json           Write gallery metadata to a info.json file
124
  --write-tags                Write image tags to separate text files
125
  --zip                       Store downloaded files in a ZIP archive
126
  --cbz                       Store downloaded files in a CBZ archive
127
  --mtime NAME                Set file modification times according to metadata selected by NAME. Examples: 'date' or
128
                              'status[date]'
129
  --rename FORMAT             Rename previously downloaded files from FORMAT to the current filename format
130
  --rename-to FORMAT          Rename previously downloaded files from the current filename format to FORMAT
131
  --ugoira FMT                Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are 'webm', 'mp4', 'gif',
132
                              'vp8', 'vp9', 'vp9-lossless', 'copy', 'zip'.
133
  --exec CMD                  Execute CMD for each downloaded file. Supported replacement fields are {} or {_path},
134
                              {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}"
135
  --exec-after CMD            Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} &&
136
                              convert * ../doc.pdf"

虽然上面的config.json里面只写了视频和图片，但是文本内容也是独立的下载下来了。

到目前我没有很仔细的研究这个软件的使用，只是收集信息的目的，这个软件是帮我实现了。

等到真的有必要的时候，我再来更新。