FFMPEG流程分析

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

FFMPEG架構分析
1. 簡介
FFmpeg是一個集錄製、轉換、音/視頻編碼解碼功能為一體的完整的開源解決方案。

FFmpeg的
開發是基於Linux操作系統，但是可以在大多數操作系統中編譯和使用。

FFmpeg支持MPEG、
DivX、MPEG4、AC3、DV、FLV等40多種編碼，AVI、MPEG、OGG、Matroska、ASF等90多種解碼.
TCPMP, VLC, MPlayer等開源播放器都用到了FFmpeg。

FFmpeg主目錄下主要有libavcodec、libavformat和libavutil等子目錄。

其中libavcodec用
於存放各個encode/decode模塊，libavformat用於存放muxer/demuxer模塊，libavutil用於
存放內存操作等輔助性模塊。

以flash movie的flv文件格式為例，muxer/demuxer的flvenc.c和flvdec.c文件
在libavformat目錄下，encode/decode的mpegvideo.c和h263de.c在libavcodec目錄下。

2. muxer/demuxer與encoder/decoder定義與初始化
muxer/demuxer和encoder/decoder在FFmpeg中的實現代碼裡，有許多相同的地方，而二者最
大的差別是muxer 和demuxer分別是不同的結構AVOutputFormat與AVInputFormat，而encoder
1
和decoder都是用的AVCodec 結構。

muxer/demuxer和encoder/decoder在FFmpeg中相同的地方有：
二者都是在main()開始的av_register_all()函數內初始化的
二者都是以鍊錶的形式保存在全局變量中的
muxer/demuxer是分別保存在全局變量AVOutputFormat *first_oformat與
AVInputFormat *first_iformat中的。

encoder/decoder都是保存在全局變量AVCodec *first_avcodec中的。

二者都用函數指針的方式作為開放的公共接口
demuxer開放的接口有：
int (*read_probe)(AVProbeData *);
int (*read_header)(struct AVFormatContext *, AVFormatParameters *ap);
int (*read_packet)(struct AVFormatContext *, AVPacket *pkt);
int (*read_close)(struct AVFormatContext *);
int (*read_seek)(struct AVFormatContext *, int stream_index, int64_t timestamp, int flags);
muxer開放的接口有：
int (*write_header)(struct AVFormatContext *);
2
int (*write_packet)(struct AVFormatContext *, AVPacket *pkt);
int (*write_trailer)(struct AVFormatContext *);
encoder/decoder的接口是一樣的，只不過二者分別只實現encoder和decoder函數：
int (*init)(AVCodecContext *);
int (*encode)(AVCodecContext *, uint8_t *buf, int buf_size, void *data);
int (*close)(AVCodecContext *);
int (*decode)(AVCodecContext *, void *outdata, int *outdata_size, uint8_t *buf, int buf_size);
仍以flv文件為例來說明muxer/demuxer的初始化。

在libavformat\allformats.c文件的av_register_all(void)函數中，通過執行
REGISTER_MUXDEMUX(FLV, flv);
將支持flv 格式的flv_muxer與flv_demuxer變量分別註冊到全局變量first_oformat與first_iformat鍊錶的最後位置。

其中flv_muxer在libavformat\flvenc.c中定義如下：
AVOutputFormat flv_muxer = {
"flv",
"flv format",
"video/x-flv",
"flv",
sizeof(FLVContext),
#ifdef CONFIG_LIBMP3LAME
CODEC_ID_MP3,
#else // CONFIG_LIBMP3LAME
CODEC_ID_NONE,
3
CODEC_ID_FLV1,
flv_write_header,
flv_write_packet,
flv_write_trailer,
.codec_tag= (const AVCodecTag*[]){flv_video_codec_ids, flv_audio_codec_ids, 0},
}
AVOutputFormat結構的定義如下：
typedef struct AVOutputFormat {
const char *name;
const char *long_name;
const char *mime_type;
const char *extensions; /**< comma separated filename extensions */
/** size of private data so that it can be allocated in the wrapper */
int priv_data_size;
/* output support */
enum CodecID audio_codec; /**< default audio codec */
enum CodecID video_codec; /**< default video codec */
int (*write_header)(struct AVFormatContext *);
int (*write_packet)(struct AVFormatContext *, AVPacket *pkt);
int (*write_trailer)(struct AVFormatContext *);
/** can use flags: AVFMT_NOFILE, AVFMT_NEEDNUMBER, AVFMT_GLOBALHEADER */
int flags;
/** currently only used to set pixel format if not YUV420P */
int (*set_parameters)(struct AVFormatContext *, AVFormatParameters *);
int (*interleave_packet)(struct AVFormatContext *, AVPacket *out, AVPacket *in, int flush);
/**
4
*list of supported codec_id-codec_tag pairs,ordered by "better choice first"
* the arrays are all CODEC_ID_NONE terminated
*/
const struct AVCodecTag **codec_tag;
/* private fields */
struct AVOutputFormat *next;
} AVOutputFormat;
由AVOutputFormat結構的定義可知，flv_muxer變量初始化的第一、二個成員分別為該muxer
的名稱與長名稱，第三、第四個成員為所對應MIME Type和後綴名，第五個成員是所對應的
私有結構的大小，第六、第七個成員為所對應的音頻編碼和視頻編碼類型ID，接下來就是三
個重要的接口函數，該muxer的功能也就是通過調用這三個接口實現的。

flv_demuxer在libavformat\flvdec.c中定義如下, 與flv_muxer類似，在這兒主要也是設置
了5個接口函數，其中flv_probe接口用途是測詴傳入的數據段是否是符合當前文件格式，這
個接口在匹配當前demuxer時會用到。

AVInputFormat flv_demuxer = {
"flv",
"flv format",
0,
flv_probe,
flv_read_header,
flv_read_packet,
5
flv_read_close,
flv_read_seek,
.extensions = "flv",
.value = CODEC_ID_FLV1,
};
在上述av_register_all(void)函數中通過執行libavcodec\allcodecs.c文件裡的
avcodec_register_all(void)函數來初始化全部的encoder/decoder。

因為不是每種編碼方式都支持encode和decode，所以有以下三種註冊方式：
#define REGISTER_ENCODER(X,x) \
if(ENABLE_##X##_ENCODER) register_avcodec(&x##_encoder)
#define REGISTER_DECODER(X,x) \
if(ENABLE_##X##_DECODER) register_avcodec(&x##_decoder)
#define REGISTER_ENCDEC(X,x) REGISTER_ENCODER(X,x); REGISTER_DECODER(X,x)
如支持flv的flv_encoder和flv_decoder變量就分別是在libavcodec\mpegvideo.c和libavcodec\h263de.c中創建的。

3. 當前muxer/demuxer的匹配
在FFmpeg的文件轉換過程中，首先要做的就是根據傳入文件和傳出文件的後綴名[FIXME]匹配合適的demuxer和muxer。

匹配上的demuxer和muxer都保存在如下所示，定義在ffmpeg.c裡的
全局變量file_iformat和file_oformat中：
6
static AVInputFormat *file_iformat;
static AVOutputFormat *file_oformat;
3.1 demuxer匹配
在libavformat\utils.c中的static AVInputFormat *av_probe_input_format2(
AVProbeData *pd, int is_opened, int *score_max)函數用途是根據傳入的probe data數據
，依次調用每個demuxer的read_probe接口，來進行該demuxer是否和傳入的文件內容匹配的
判斷。

其調用順序如下：
void parse_options(int argc, char **argv, const OptionDef *options,
void (* parse_arg_function)(const char *));
static void opt_input_file(const char *filename)
int av_open_input_file(……)
AVInputFormat *av_probe_input_format(AVProbeData *pd,
int is_opened)
static AVInputFormat *av_probe_input_format2(……)
opt_input_file函數是在保存在const OptionDef options[]數組中，用於
void parse_options(int argc, char **argv, const OptionDef *options)中解析argv裡的
“-i”參數，也就是輸入文件名時調用的。

3.2 muxer匹配
7
與demuxer的匹配不同，muxer的匹配是調用guess_format函數，根據main() 函數的argv裡的
輸出文件後綴名來進行的。

void parse_options(int argc, char **argv, const OptionDef *options,
void (* parse_arg_function)(const char *));
void parse_arg_file(const char *filename)
static void opt_output_file(const char *filename)
AVOutputFormat *guess_format(const char *short_name,
const char *filename,
const char *mime_type)
3.3 當前encoder/decoder的匹配
在main()函數中除了解析傳入參數並初始化demuxer與muxer的parse_options( )函數以外，
其他的功能都是在av_encode( )函數里完成的。

在libavcodec\utils.c中有如下二個函數:
AVCodec *avcodec_find_encoder(enum CodecID id)
AVCodec *avcodec_find_decoder(enum CodecID id)
他們的功能就是根據傳入的CodecID，找到匹配的encoder和decoder。

在av_encode( )函數的開頭，首先初始化各個AVInputStream和AVOutputStream，然後分別調
用上述二個函數，並將匹配上的encoder與decoder分別保存在:
AVInputStream->AVStream *st->AVCodecContext *codec->struct AVCodec *codec與
8
AVOutputStream->AVStream *st->AVCodecContext *codec->struct AVCodec *codec變量。

4. 其他主要數據結構
4.1 AVFormatContext
AVFormatContext是FFMpeg格式轉換過程中實現輸入和輸出功能、保存相關數據的主要結構。

每一個輸入和輸出文件，都在如下定義的指針數組全局變量中有對應的實體。

static AVFormatContext *output_files[MAX_FILES];
static AVFormatContext *input_files[MAX_FILES];
對於輸入和輸出，因為共用的是同一個結構體，所以需要分別對該結構中如下定義的iformat
或oformat成員賦值。

struct AVInputFormat *iformat;
struct AVOutputFormat *oformat;
對一個AVFormatContext來說，這二個成員不能同時有值，即一個AVFormatContext不能同時
含有demuxer和muxer。

在main( )函數開頭的parse_options( )函數中找到了匹配的muxer和
demuxer之後，根據傳入的argv參數，初始化每個輸入和輸出的AVFormatContext結構，並保
存在相應的output_files和input_files指針數組中。

在av_encode( )函數中，output_files
和input_files是作為函數參數傳入後，在其他地方就沒有用到了。

4.2 AVCodecContext
保存AVCodec指針和與codec相關數據，如video的width、height，audio的sample rate等。

9
AVCodecContext中的codec_type，codec_id二個變量對於encoder/decoder的匹配來說，最為
重要。

enum CodecType codec_type; /* see CODEC_TYPE_xxx */
enum CodecID codec_id; /* see CODEC_ID_xxx */
如上所示，codec_type保存的是CODEC_TYPE_VIDEO，CODEC_TYPE_AUDIO等媒體類型，
codec_id保存的是CODEC_ID_FLV1，CODEC_ID_VP6F等編碼方式。

以支持flv格式為例，在前述的av_open_input_file(……) 函數中，匹配到正確的
AVInputFormat demuxer後，通過av_open_input_stream( )函數中調用AVInputFormat的
read_header接口來執行flvdec.c中的flv_read_header( )函數。

在flv_read_header( )函數
內，根據文件頭中的數據，創建相應的視頻或音頻AVStream，並設置AVStream中
AVCodecContext的正確的codec_type值。

codec_id值是在解碼過程中flv_read_packet( )函
數執行時根據每一個packet頭中的數據來設置的。

4.3 AVStream
AVStream結構保存與數據流相關的編解碼器，數據段等信息。

比較重要的有如下二個成員：
AVCodecContext *codec; /**< codec context */
void *priv_data;
其中codec指針保存的就是上節所述的encoder或decoder結構。

priv_data指針保存的是和具
10
體編解碼流相關的數據，如下代碼所示，在ASF的解碼過程中，priv_data保存的就是
ASFStream結構的數據。

AVStream *st;
ASFStream *asf_st;
……
st->priv_data = asf_st;
4.4 AVInputStream/ AVOutputStream
根據輸入和輸出流的不同，前述的AVStream結構都是封裝在AVInputStream和AVOutputStream結構中，在av_encode( )函數中使用。

AVInputStream中還保存的有與時間有關的信息。

AVOutputStream中還保存有與音視頻同步等相關的信息。

4.5 AVPacket
AVPacket結構定義如下，其是用於保存讀取的packet數據。

typedef struct AVPacket {
int64_t pts; ///< presentation time stamp in time_base units
int64_t dts; ///< decompression time stamp in time_base units
uint8_t *data;
int size;
int stream_index;
int flags;
int duration; ///< presentation duration in time_base units (0 if not available)
void (*destruct)(struct AVPacket *);
void *priv;
11
int64_t pos; ///< byte position in stream, -1 if unknown
} AVPacket;
在av_encode()函數中，調用AVInputFormat的
(*read_packet)(struct AVFormatContext *, AVPacket *pkt)接口，讀取輸入文件的一幀數
據保存在當前輸入AVFormatContext的AVPacket成員中。

-------------------------------------------------- -------------------
FFMPEG是目前被應用最廣泛的編解碼軟件庫，支持多種流行的編解碼器，它是C語言實現的，不僅被集成到各種PC軟件，也經常被移植到多種嵌入式設備中。

使用面向對象的辦法來設想這樣一個編解碼庫，首先讓人想到的是構造各種編解碼器的類，然後對於它們的抽象基類確定運行數據流的規則，根據算法轉換輸入輸出對象。

在實際的代碼，將這些編解碼器分成encoder/decoder，muxer/demuxer和device三種對象，分別對應於編解碼，輸入輸出格式和設備。

在main 函數的開始，就是初始化這三類對象。

在avcodec_register_all中，很多編解碼器被註冊，包括視頻的H.264 解碼器和X264編碼器等，REGISTER_DECODER (H264, h264);
REGISTER_ENCODER (LIBX264, libx264);
找到相關的宏代碼如下
#define REGISTER_ENCODER(X,x) { \
extern AVCodec x##_encoder; \
if(CONFIG_##X##_ENCODER) avcodec_register(&x##_encoder); }
#define REGISTER_DECODER(X,x) { \
extern AVCodec x##_decoder; \
if(CONFIG_##X##_DECODER) avcodec_register(&x##_decoder); }
12
這樣就實際在代碼中根據CONFIG_##X##_ENCODER這樣的編譯選項來註冊libx264_encoder和h264_decoder，註冊的過程發生在avcodec_register(AVCodec *codec)函數中，實際上就是向全局鍊錶first_avcodec中加入libx264_encoder、h264_decoder特定的編解碼器，輸入參數AVCodec是一個結構體，可以理解為編解碼器的基類，其中不僅包含了名稱，id等屬性，而且包含瞭如下函數指針，讓每個具體的編解碼器擴展類實現。

int (*init)(AVCodecContext *);
int (*encode)(AVCodecContext *, uint8_t *buf, int buf_size, void *data);
int (*close)(AVCodecContext *);
int (*decode)(AVCodecContext *, void *outdata, int *outdata_size,
const uint8_t *buf, int buf_size);
void (*flush)(AVCodecContext *);
繼續追踪libx264，也就是X264的靜態編碼庫，它在FFMPEG編譯的時候被引入作為H.264編碼器。

在libx264.c中有如下代碼
AVCodec libx264_encoder = {
.name = "libx264",
.type = CODEC_TYPE_VIDEO,
.id = CODEC_ID_H264,
.priv_data_size = sizeof(X264Context),
.init = X264_init,
.encode = X264_frame,
.close = X264_close,
.capabilities = CODEC_CAP_DELAY,
.pix_fmts = (enum PixelFormat[]) { PIX_FMT_YUV420P, PIX_FMT_NONE },
.long_name = NULL_IF_CONFIG_SMALL("libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10"),
};
這裡具體對來自AVCodec得屬性和方法賦值。

其中
13
.init = X264_init,
.encode = X264_frame,
.close = X264_close,
將函數指針指向了具體函數，這三個函數將使用libx264靜態庫中提供的API，也就是X264的主要接口函數進行具體實現。

pix_fmts定義了所支持的輸入格式，這裡4：2：0
PIX_FMT_YUV420P, ///< planar YUV 4:2:0, 12bpp, (1 Cr & Cb sample per 2x2 Y samples)
上面看到的X264Context封裝了X264所需要的上下文管理數據，
typedef struct X264Context {
x264_param_t params;
x264_t *enc;
x264_picture_t pic;
AVFrame out_pic;
} X264Context;
它屬於結構體AVCodecContext的void *priv_data變量，定義了每種編解碼器私有的上下文屬性，AVCodecContext也類似上下文基類一樣，還提供其他表示屏幕解析率、量化範圍等的上下文屬性和rtp_callback等函數指針供編解碼使用。

回到main函數，可以看到完成了各類編解碼器，輸入輸出格式和設備註冊以後，將進行上下文初始化和編解碼參數讀入，然後調用av_encode （）函數進行具體的編解碼工作。

根據該函數的註釋一路查看其過程：
1. 輸入輸出流初始化。

2. 根據輸入輸出流確定需要的編解碼器，並初始化。

3. 寫輸出文件的各部分
14
重點關註一下step2和3，看看怎麼利用前面分析的編解碼器基類來實現多態。

大概查看一下這段代碼的關係，發現在FFMPEG裡，可以用類圖來表示大概的編解碼器組合。

可以參考【3】來了解這些結構的含義（見附錄）。

在這裡會調用一系列來自utils.c的函數，這裡的avcodec_open（）函數，在打開編解碼器都會調用到，它將運行如下代碼：
avctx->codec = codec;
avctx->codec_id = codec->id;
avctx->frame_number = 0;
if(avctx->codec->init){
ret = avctx->codec->init(avctx);
進行具體適配的編解碼器初始化，而這裡的avctx->codec->init(avctx)就是調用AVCodec中函數指針定義的具體初始化函數，例如X264_init。

在avcodec_encode_video（）和avcodec_encode_audio（）被output_packet（）調用進行音視頻編碼，將同樣利用函數指針avctx->codec->encode （）調用適配編碼器的編碼函數，如X264_frame進行具體工作。

從上面的分析，我們可以看到FFMPEG怎麼利用面向對象來抽象編解碼器行為，通過組合和繼承關係具體化每個編解碼器實體。

設想要在FFMPEG中加入新的解碼器H265，要做的事情如下：
1. 在config編譯配置中加入CONFIG_H265_DECODER
2. 利用宏註冊H265解碼器
3. 定義AVCodec 265_decoder變量，初始化屬性和函數指針
15
4. 利用解碼器API具體化265_decoder的init等函數指針
完成以上步驟，就可以把新的解碼器放入FFMPEG，外部的匹配和運行規則由基類的多態實現了。

4. X264架構分析
X264 是一款從2004年有法國大學生髮起的開源H.264編碼器，對PC進行彙編級代碼優化，捨棄了片組和多參考幀等性能效率比不高的功能來提高編碼效率，它被FFMPEG作為引入的.264編碼庫，也被移植到很多DSP嵌入平台。

前面第三節已經對FFMPEG中的X264進行舉例分析，這裡將繼續結合X264框架加深相關內容的了解。

查看代碼前，還是思考一下對於一款具體的編碼器，怎麼面向對象分析呢？對熵編碼部分對不同算法的抽象，還有幀內或幀間編碼各種估計算法的抽象，都可以作為類來構建。

在X264中，我們看到的對外API和上下文變量都聲明在X264.h中，API函數中，關於輔助功能的函數在common.c中定義
void x264_picture_alloc( x264_picture_t *pic, int i_csp, int i_width, int i_height );
void x264_picture_clean( x264_picture_t *pic );
int x264_nal_encode( void *, int *, int b_annexeb, x264_nal_t *nal );
而編碼功能函數定義在encoder.c
x264_t *x264_encoder_open ( x264_param_t * );
int x264_encoder_reconfig( x264_t *, x264_param_t * );
int x264_encoder_headers( x264_t *, x264_nal_t **, int * );
int x264_encoder_encode ( x264_t *, x264_nal_t **, int *, x264_picture_t *, x264_picture_t * );
void x264_encoder_close ( x264_t * );
在x264.c文件中，有程序的main函數，可以看作做API使用的例子，它也是通過調用X264.h中的API和上下文變量來實現實際功能。

16
X264最重要的記錄上下文數據的結構體x264_t定義在common.h中，它包含了從線程控制變量到具體的SPS、PPS、量化矩陣、cabac上下文等所有的H.264編碼相關變量。

其中包含如下的結構體
x264_predict_t predict_16x16[4 3];
x264_predict_t predict_8x8c[4 3];
x264_predict8x8_t predict_8x8[9 3];
x264_predict_t predict_4x4[9 3];
x264_predict_8x8_filter_t predict_8x8_filter;
x264_pixel_function_t pixf;
x264_mc_functions_t mc;
x264_dct_function_t dctf;
x264_zigzag_function_t zigzagf;
x264_quant_function_t quantf;
x264_deblock_function_t loopf;
跟踪查看可以看到它們或是一個函數指針，或是由函數指針組成的結構，這樣的用法很想面向對像中的interface接口聲明。

這些函數指針將在x264_encoder_open（）函數中被初始化，這裡的初始化首先根據CPU的不同提供不同的函數實現代碼段，很多與可能是彙編實現，以提高代碼運行效率。

其次把功能相似的函數集中管理，例如類似intra16的4種和intra4的九種預測函數都被用函數指針數組管理起來。

x264_encoder_encode（）是負責編碼的主要函數，而其內包含的x264_slice_write（）負責片層一下的具體編碼，包括了幀內和幀間宏塊編碼。

在這裡，cabac和cavlc的行為是根據h->param.b_cabac來區別的，分別運行x264_macroblock_write_cabac（）和x264_macroblock_write_cavlc （）來寫碼流，在這一部分，功能函數按文件定義歸類，基本按照編碼流程圖運行，看起來更像面向過程的寫法，在已經初始化了具體的函數指針，程序就一直按編碼過程的邏輯實現。

如果從整體架構來看，x264利用這種類似接口的形式實現了弱耦合和可重用，利用x264_t這個貫穿始
17
終的上下文，實現信息封裝和多態。

本文大概分析了FFMPEG/X264的代碼架構，重點探討用C語言來實現面向對象編碼，雖不至於強行向C 靠攏，但是也各有實現特色，保證實用性。

值得規劃C語言軟件項目所借鑒。

【參考文獻】
1.“用例子說明面向對象和麵向過程的區別”
2. liyuming1978，“liyuming1978的專欄”
3. “FFMpeg框架代碼閱讀”
Using libavformat and libavcodec
Martin Böhme (boehme@inb.uni-luebeckREMOVETHIS.de)
February 18, 2004
Update (January 23 2009): By now, these articles are quite out of date... unfortunately, I haven't found the time to update them, but thankfully, others have jumped in. Stephen Dranger has a more recent tutorial, ryanfb of has an updated version of the code, and David Hoerl has a more recent update.
Update (July 22 2004): I discovered that the code I originally presented contained a memory leak (av_free_packet() wasn't bei ng called). My apologies - I've updated the demo program and the code in the article to eliminate the leak .
Update (July 21 2004): There's a new prerelease of ffmpeg (0.4.9-pre1). I describe the changes to the libavformat / libavcodec API in this article. The libavformat and libavcodec libraries that come with ffmpeg are a great way of accessing a large variety of video file formats. Unfortunately, there is no real documentation on using these libraries in your own programs (at least I couldn't find any), and the example programs aren't really very helpful either.
This situation meant that, when I used libavformat/libavcodec on a recent project, it took quite a lot of experimentation to find out how to use
18
them. Here's what I learned - hopefully I'll be able to save others from having to go through the same trial-and-error process. There's also a small demo program that you can download. The code I'll present works with libavformat/libavcodec as included in version 0.4.8 of ffmpeg (the most recent version as I'm writing this). If you find that later versions break the code, please let me know.
In this document, I'll only cover how to read video streams from a file; audio streams work pretty much the same way, but I haven't actually used them, so I can't present any example code.
In case you're wondering why there are two libraries, libavformat and libavcodec: Many video file formats (AVI being a prime example) don't actually specify which codec(s) should be used to encode audio and video data; they merely define how an audio and a video s tream (or, potentially, several audio/video streams) should be combined into a single file. This is why sometimes, when you open an AVI file, you get only sound, but no picture - because the right video codec isn't installed on your system. Thus, libavformat deals with parsing video files and separating the streams contained in them, and libavcodec deals with decoding raw audio and video streams.
Opening a Video File
First things first - let's look at how to open a video file and get at the streams contained in it. The first thing we need to do is to initialize libavformat/libavcodec:
av_register_all();
This registers all available file formats and codecs with the library so they will be used automatically when a file with the corresponding
format/codec is opened. Note that you only need to call av_register_all() once, so it's probably best to do this somewhere in your startup cod e. If you like, it's possible to register only certain individual file formats and codecs, but there's usually no reason why you wo uld have to do that. Next off, opening the file:
AVFormatContext *pFormatCtx;
const char *filename="myvideo.mpg";
// Open video file
if(av_open_input_file(&pFormatCtx, filename, NULL, 0, NULL)!=0)
handle_error(); // Couldn't open file
The last three parameters specify the file format, buffer size and format parameters; by simply specifying NULL or 0 we ask libavformat to
auto-detect the format and use a default buffer size. Replace handle_error() with appropriate error handling code for your applica tion.
Next, we need to retrieve information about the streams contained in the file:
// Retrieve stream information
if(av_find_stream_info(pFormatCtx)<0)
handle_error(); // Couldn't find stream information
19
This fills the streams field of the AVFormatContext with valid information. As a debugging aid, we'll dump this information onto standard error, but of course you don't have to do this in a production application:
dump_format(pFormatCtx, 0, filename, false);
As mentioned in the introduction, we'll handle only video streams, not audio streams. To make things nice and easy, we simply use the first video stream we find:
int i, videoStream;
AVCodecContext *pCodecCtx;
// Find the first video stream
videoStream=-1;
for(i=0; i<pFormatCtx->nb_streams; i )
if(pFormatCtx->streams[i]->codec.codec_type==CODEC_TYPE_VIDEO)
{
videoStream=i;
break;
}
if(videoStream==-1)
handle_error(); // Didn't find a video stream
// Get a pointer to the codec context for the video stream
pCodecCtx=&pFormatCtx->streams[videoStream]->codec;
OK, so now we've got a pointer to the so-called codec context for our video stream, but we still have to find the actual codec and open it: AVCodec *pCodec;
// Find the decoder for the video stream
pCodec=avcodec_find_decoder(pCodecCtx->codec_id);
if(pCodec==NULL)
handle_error(); // Codec not found
// Inform the codec that we can handle truncated bitstreams -- ie,
// bitstreams where frame boundaries can fall in the middle of packets
if(pCodec->capabilities & CODEC_CAP_TRUNCATED)
pCodecCtx->flags|=CODEC_FLAG_TRUNCATED;
20
// Open codec
if(avcodec_open(pCodecCtx, pCodec)<0)
handle_error(); // Could not open codec
(So what's up with those "truncated bitstreams"? Well, as we'll see in a moment, the data in a video stream is split up into packets. Since the amount of data per video frame can vary, the boundary between two video frames need not coincide with a packet boundary. Here, we're telling the codec that we can handle this situation.)
One important piece of information that is stored in the AVCodecContext structure is the frame rate of the video. To allow for non-integer frame rates (like NTSC's 29.97 fps), the rate is stored as a fraction, with the numerator in pCodecCtx-> frame_rate and the denominator in pCodecCtx->frame_rate_base. While testing the library with different video files, I noticed that some codecs (notably ASF) seem to fill these fields incorrectly (frame_rate_base contains 1 instead of 1000). The following hack fixes this:
// Hack to correct wrong frame rates that seem to be generated by some
// codecs
if(pCodecCtx->frame_rate>1000 && pCodecCtx->frame_rate_base==1)
pCodecCtx->frame_rate_base=1000;
Note that it shouldn't be a problem to leave this fix in place even if the bug is corrected some day - it's unlikely that a video would have a frame rate of more than 1000 fps.
One more thing left to do: Allocate a video frame to store the decoded images in:
AVFrame *pFrame;
pFrame=avcodec_alloc_frame();
That's it! Now let's start decoding some video.
Decoding Video Frames
As I've already mentioned, a video file can contain several audio and video streams, and each of those streams is split up in to packets of a particular size. Our job is to read these packets one by one using libavformat, filter out all those that aren't part of the video stream we're interested in, and hand them on to libavcodec for decoding. In doing this, we'll have to take care of the fact that the bound ary between two frames can occur in the middle of a packet.
Sound complicated? Lucikly, we can encapsulate this whole process in a routine that simply returns the next video frame:
bool GetNextFrame(AVFormatContext *pFormatCtx, AVCodecContext *pCodecCtx,
int videoStream, AVFrame *pFrame)
{
21
static AVPacket packet;
static int bytesRemaining=0;
static uint8_t *rawData;
static bool fFirstTime=true;
int bytesDecoded;
int frameFinished;
// First time we're called, set packet.data to NULL to indicate it
// doesn't have to be freed
if(fFirstTime)
{
fFirstTime=false;
packet.data=NULL;
}
// Decode packets until we have decoded a complete frame
while(true)
{
// Work on the current packet until we have decoded all of it
while(bytesRemaining > 0)
{
// Decode the next chunk of data
bytesDecoded=avcodec_decode_video(pCodecCtx, pFrame,
&frameFinished, rawData, bytesRemaining);
// Was there an error?
if(bytesDecoded < 0)
{
fprintf(stderr, "Error while decoding frame\n");
return false;
}
bytesRemaining-=bytesDecoded;
22
rawData =bytesDecoded;
// Did we finish the current frame? Then we can return
if(frameFinished)
return true;
}
// Read the next packet, skipping all packets that aren't for this
// stream
do
{
// Free old packet
if(packet.data!=NULL)
av_free_packet(&packet);
// Read new packet
if(av_read_packet(pFormatCtx, &packet)<0)
goto loop_exit;
} while(packet.stream_index!=videoStream);
bytesRemaining=packet.size;
rawData=packet.data;
}
loop_exit:
// Decode the rest of the last frame
bytesDecoded=avcodec_decode_video(pCodecCtx, pFrame, &frameFinished,
rawData, bytesRemaining);
// Free last packet
if(packet.data!=NULL)
av_free_packet(&packet);
return frameFinished!=0;
}
Now, all we have to do is sit in a loop, calling GetNextFrame() until it returns false. Just one more thing to take care of: Most codecs return
23
images in YUV 420 format (one luminance and two chrominance channels, with the chrominance channels samples at half the spatial resolution of the luminance channel). Depending on what you want to do with the video data, you may want to convert this to RGB. (Note, though, that this is not necessary if all you want to do is display the video data; take a look at the X11 Xvideo extension, which does YUV-to-RGB and scaling in hardware.) Fortunately, libavcodec provides a conversion routine called img_convert, which does conversion between YUV and RGB as well as a variety of other image formats. The loop that decodes the video thus becomes:
while(GetNextFrame(pFormatCtx, pCodecCtx, videoStream, pFrame))
{
img_convert((AVPicture *)pFrameRGB, PIX_FMT_RGB24, (AVPicture*)pFrame,
pCodecCtx->pix_fmt, pCodecCtx->width, pCodecCtx->height);
// Process the video frame (save to disk etc.)
DoSomethingWithTheImage(pFrameRGB);
}
The RGB image pFrameRGB (of type AVFrame *) is allocated like this:
AVFrame *pFrameRGB;
int numBytes;
uint8_t *buffer;
// Allocate an AVFrame structure
pFrameRGB=avcodec_alloc_frame();
if(pFrameRGB==NULL)
handle_error();
// Determine required buffer size and allocate buffer
numBytes=avpicture_get_size(PIX_FMT_RGB24, pCodecCtx->width,
pCodecCtx->height);
buffer=new uint8_t[numBytes];
// Assign appropriate parts of buffer to image planes in pFrameRGB
avpicture_fill((AVPicture *)pFrameRGB, buffer, PIX_FMT_RGB24,
pCodecCtx->width, pCodecCtx->height);
Cleaning up
OK, we've read and processed our video, now all that's left for us to do is clean up after ourselves:
24
// Free the RGB image
delete [] buffer;
av_free(pFrameRGB);
// Free the YUV frame
av_free(pFrame);
// Close the codec
avcodec_close(pCodecCtx);
// Close the video file
av_close_input_file(pFormatCtx);
Done!
Sample Code
A sample app that wraps all of this code up in compilable form is here. If you have any additional comments, please contact me at
boehme@inb.uni-luebeckREMOVETHIS.de. Standard disclaimer: I assume no liability for the correct functioning of the code and techniques presented in this article.
以拼音方式閱讀
av_encode函數主要流程(ffmpeg)
av_encode( )函數是FFMpeg中最重要的函數，編解碼和輸出等大部分功能都在此函數內完成，因此有必要詳細描述一下這個函數的主要流程。

1. input streams initializing
2. output streams initializing
3. encoders and decoders initializing
4. set meta data information from input file if required.( ex:mp3的id3，在ffmpeg中也叫metadata)
25
5. write output files header
6. loop of handling each frame
a. read frame from input file:
b. decode frame data
c. encode new frame data
d. write new frame to output file
7. write output files trailer
8. close each encoder and decoder
26。