This is the second of a two-part series on creating QuickTime movies "from scratch" in Java. By that, I mean we're creating our own media data, piece by piece, to assemble the movie. Doing things at this low level is tricky, but I hope you'll agree after this installment that it's remarkably powerful.
Part 1 began with the structure of a QuickTime movie as a collection of tracks, each of which has exactly one Media object that in turn
references media data that can be in the movie file, in another file, or out on the
network. The Media has tables that indicate how to find specific
"samples," individual pieces of audio, video, text, or other content
to be rendered at a specific time in the movie. Part 1 used easy-to-create
text tracks to show how to build up a Media structure, first by
creating a simple all-text movie and then by adding textual
"time-code" samples as a new text track in an existing movie.
In this part, we'll move on to creating video tracks from scratch, building up a video media object by adding graphic samples.
The goal of this article's sample code is to take a graphic file and make a movie out of it by "moving" around the image — you may have seen this concept in iMovie, where Apple calls it the "Ken Burns Effect," after the director who used it extensively in PBS' The Civil War and other documentaries. There is also a shareware application called Photo to Movie that does much the same thing.
|
Source Code Download the source code for the examples. |
We can make this work because of the concept of persistence of vision, which says that the human eye perceives a series of images, alternated sufficiently quickly, as motion. To do an image-to-movie effect, we show slightly different parts of the picture in each distinct image or "frame," creating the illusion of moving from one part of the picture to another.
VideoMediaIn creating text tracks, the approach was to:
Media object to it.MediaHandler and use that to add samples to the
Media. The same approach generally works for video, except that the
VisualMediaHandler doesn't do anything for us. Instead, we need
to create a compression sequence, or CSequence, to prepare
samples, encoded and compressed with a codec supported by QuickTime. We'll
then add these samples directly to the Media.
The CSequence class has a method called
compressFrame, which is what we need to generate samples. Its
signature is:
public CompressedFrameInfo compressFrame(QDGraphics src,
QDRect srcRect,
int flags,
RawEncodedImage data)
throws StdQTException
That doesn't look too bad. We just need a QDGraphics as the
source of our image, a rectangle describing what part of the image to use, some
behavior flags, and a RawEncodedImage buffer into which to put the compressed
frame.
|
Related Reading
Digital Video Pocket Guide |
GWorld with QDGraphics"So what's a QDGraphics?", you might be wondering. The name is
presumably meant to evoke thoughts of the AWT's Graphics. Indeed,
the two are remarkably similar: each represents a drawing surface, either
on-screen or off-, containing methods for drawing lines, circles, arcs,
polygons, and text strings.
One clever thing that QDGraphics does under the covers is to
offer an isolation layer to hide whether the drawing surface is on-screen or
off-screen unless you specifically ask for it, and what native structures
(CGrafPort and GWorld) are involved. One odd
side effect of this arrangement is that while there are many
getGWorld() methods throughout the QTJ API, there's no
GWorld class to return, so you get QDGraphics
instead.
In fact, the GraphicsImporter offers a
getGWorld(), and if you guessed that this class offers a way to get an
image into QuickTime, you're right. So now we have some idea of how we're
going to connect the dots to make a movie from an image:
GraphicsImporter can read an image file.getGWorld() that returns a QDGraphics.QDGraphics can go to CSequence.compressFrame() as the src parameter.RawEncodedImage created by compressFrame can be added to our Media with addSample().One strategy for getting the frames is to:
Get starting and ending rectangles, where a rectangle is a
QDRect representing an upper-left corner point and width by height
dimensions.

Step One
Calculate a series of intermediate rectangles that take us from the
startRect to the endRect.

Step Two
For each of these intermediate fromRects, call
compressFrame to make a frame from that portion of the original
image. Add each frame as a sample.

Step Three
If you have QuickTime 5 or better, you can see the result here.
This strategy works, but it is limited by the size of the original image. This is pretty much a fatal flaw. If the image is only slightly larger than the movie size (i.e., the size of the rectangles), there isn't much room to move around. If it's smaller than our movie, then it won't work at all. On the other hand, if the image is much larger than our desired movie dimensions, then we might not be able to get the parts of the picture we want — it's not very useful if we can't get someone's entire face in the movie, and instead settle for a shot that moves from their nose to their chin.
Scaling the image would be a nice improvement, but we can actually do better
than that. If we could scale each fromRect, then we could
"zoom" in or out of the picture by using progressively larger or
smaller source regions. But how do we do this?
|
Matrix ReloadedPart 1 demonstrated how QuickTime's Matrix class could be used to define a spatial transformation. Mainly, we used it to move text located at
(0,0) to a point at the bottom of a movie, but look at the javadocs and you'll
see some intriguing methods, like rotate() and
scale().
The key to our improved strategy is a method called rect()
that combines a coordinate mapping with a scaling operation. This allows us
to use any source rectangle and scale it to the size of the frames we're
compressing for the movie.
To make this work, the sample code creates an offscreen
QDGraphics and tells the GraphicsImporter to use this
for its draw()s. The new QDGraphics's dimensions are
the same as those of the frames we intend to compress. That means its bounds
are a QDRect with upper-left corner 0,0 and constant dimensions
VIDEO_TRACK_WIDTH by VIDEO_TRACK_HEIGHT (which I've
set to 360 by 240, but you're welcome to change in the code). For each
intermediate fromRect, we create a Matrix to map from
the fromRect's QDRect to our QDRect's bounds.
The revised process looks like this:
QDRect representing an upper-left corner point and width by height
dimensions.

Step One
startRect to the endRect.

Step Two
fromRects, use a
Matrix to scale the rectangle into the bounds of an offscreen
QDGraphics, draw it into the QDGraphics, and then call
compressFrame to make a frame from the offscreen QDGraphics. Add
each frame as a sample.

Step Three
Given that strategy, let's step through the code that makes it all work.
We'll skip over creating the movie itself, which we covered last time.
Similarly, creating and adding the VideoTrack and
VideoMedia are a very straightforward analogue to last article's
TextTrack and TextMedia setup.
If this is your first time compiling and running QuickTime for Java
code, see my earlier article, "A Gentle Re-Introduction to QuickTime for Java," for information on how to work out CLASSPATH and Java versioning issues.
To get things started in this example, we need to know the source image
file, as well as the startRect and endRect rectangles
that define the movie we are to make. The sample code expects a makecsequence.properties file to be in the current directory, with
entries that look something like this:
file=/Users/cadamson/Pictures/keagy/DSC01763.jpg
start.x=545
start.y=370
start.width=1500
start.height=1125
end.x=400
end.y=390
end.width=800
end.height=600
If this file is absent, the user will be queried for an image file at runtime, and the rectangles will be chosen randomly.
Given a QTFile for the image file, creating the
GraphicsImporter is quite straightforward:
GraphicsImporter importer = new GraphicsImporter (imgFile);
Next, we create the offscreen QDGraphics and tell the
GraphicsImporter to use it for its drawing:
QDGraphics gw =
new QDGraphics (new QDRect (0, 0,
VIDEO_TRACK_WIDTH,
VIDEO_TRACK_HEIGHT));
importer.setGWorld (gw, null);
Notice that I inadvertently called the variable gw, as in
"GWorld". The use of that term in the API and Apple's docs is
really pervasive!
One thing we have to prepare early is a block of memory big enough to hold
the largest possible frame that the chosen video compressor could create. To
do this, we call a getMaxCompressionSize() method, allocate a
block of memory of that size (as referenced by a QTHandle), and
"lock" the handle so it can't move while we're working with it.
Finally, we can create a RawEncodedImage object with this buffer:
int rawImageSize =
QTImage.getMaxCompressionSize (gw,
gRect,
gw.getPixMap().getPixelSize(),
StdQTConstants.codecNormalQuality,
CODEC_TYPE,
CodecComponent.anyCodec);
QTHandle imageHandle = new QTHandle (rawImageSize, true);
imageHandle.lock();
RawEncodedImage compressedImage =
RawEncodedImage.fromQTHandle(imageHandle);
|
Related Reading
|
The CODEC_TYPE is a constant defined early in the sample code.
It is an int that indicates which QuickTime-supported compression
scheme we've chosen to use, "codec" being the term for a
scheme by which video is encoded and decoded. Many of these are provided as
constants in the StdQTConstants class. Among the popular choices
are:
kCinepakCodecType. Cinepak is a widely supported codec dating
back to the early 90s. However, its image quality and compression ratios
aren't very compelling anymore.
kSorensonCodecType. Sorenson Video pretty much replaced
Cinepak for a lot of QuickTime users with its higher quality and great
compression.
kH263CodecType. H.263 is a codec originally
designed for videoconferencing but widely used in other environments. It is
also supported by Windows Media Player, the Java Media Framework, and is a simple form of MPEG-4 video.
kAnimationCodecType. A compressor meant for use with
synthetic images. Apple's demo code uses this a lot, but that's because their
sample apps create their own image data. Our photo doesn't compress well with
the Animation codec, so only use it here if you want to be shocked by how big
the resulting file is (hint: make sure you have at least 15MB free!).
There are more supported codecs than QTJ lets on, but you have to look in
the native API's ImageCompression.h to find them. Two great
options are:
Sorenson 3. A newer version of the Sorenson codec, Sorenson 3 is available in QuickTime 5
and up. The identifier for this codec is SVQ3, so to create the
int that QuickTime wants, we take the bottom eight bits of each
character (we pretend that we've gone back in time and Unicode doesn't exist
yet). Since S is 0x53, V is
0x56, Q is 0x51, and 3 is
0x33, the int value is 0x53565133.
MPEG-4. You can use MPEG-4 video in a regular QuickTime .mov
container, with the caveat that only QuickTime will be able to read it —
to create a real .mp4 file, you'd need to use a
MovieExporter, as shown in the article
on the QuickTime File Format. For our current purposes, the codec type of
MPEG-4 video is mp4v, which translates to the int
value 0x6d703476.
|
The next thing we do is to create a CSequence. This object
provides us the ability to compress frames. We have to call this with each
frame to compress, in order, and there's an interesting reason for this. If we
were using a compression scheme meant for single images, such as JPEG, we could
do the images in any order, since each frame would have all of the information it
needed to be decompressed and rendered. This is generally not true of
video compression schemes, which often use "temporal compression":
techniques to compress data by eliminating redundant information
between frames, such as an unchanging background. Because of this approach,
decoding a given frame might depend on information from one or more previous
frames, which is why we have to do our compression through an object that
understands that we're working with a series of images.
The CSequence constructor looks like this:
CSequence seq = new CSequence (gw,
gRect,
gw.getPixMap().getPixelSize(),
CODEC_TYPE,
CodecComponent.bestFidelityCodec,
StdQTConstants.codecNormalQuality,
StdQTConstants.codecNormalQuality,
KEY_FRAME_RATE,
null,
0);
These arguments are, in order:
QDGraphics src: the QDGraphics from which to get image data. In our case, the offscreen GWorld into which we draw.QDRect srcRect: the portion of the src to use.
In our case, the whole thing.int colorDepth: an int indicating the likely
depth (4-bit color, 32-bit color, etc.) at which the frames are likely to be viewed. Pass 0 to let the Image Compression Manager choose for you.
More info lives in the docs for the native function.int cType: the codec type, as described above.CodecComponent codec: often used to request a specific
behavior of the given codec, such as the CodecComponent constants
bestSpeedCodec, bestFidelityCodec, or
bestCompressionCodec. int spatialQuality: a quality setting for the images, from
codecMinQuality, through low, normal, and high, up to
codecMaxQuality and, in for codecs that allow it,
codecLosslessQuality (all in StdQTConstants).int temporalQuality: the quality setting for inter-frame
compression, with values as above.int keyFrameRate: the maximum number of frames allowed between
"key frames," which are the frames that have all of the information they
need, and that may be needed for multiple subsequent frames to decompress.ColorTable clut: a custom color lookup table, often set to
null to let QuickTime use the table from the source image.int flags: one or more behavior flags, logically
OR'd together. One interesting option is
codecFlagWasCompressed, which hints that the source image was
previously compressed and gives the codec a chance to compensate for the
artifacting and other image degradation that occurs when an image has been
compressed with a lossy codec (like JPEG).Once we've created the CSequence, we get an
ImageDescription object, which we'll need later when adding
samples to the Media.
Now we can start the loop to draw, compress, and add frames. We calculate a
rectangle, fromRect, inside of the original image. This will be the
source of this frame. Next, we create a Matrix that maps and
scales from its original location and size to the offscreen buffer's location
and size; in other words, a rectangle at (0,0) with dimensions
VIDEO_TRACK_WIDTH by VIDEO_TRACK_HEIGHT. Calling
GraphicsImporter.draw() performs the scaled drawing of the region
into the offscreen QDGraphics.
Matrix drawMatrix = new Matrix();
drawMatrix.rect (fromRect, gRect);
importer.setMatrix (drawMatrix);
importer.draw();
Next, we compress the image that was drawn into the offscreen
QDGraphics:
CompressedFrameInfo cfInfo =
seq.compressFrame (gw,
gRect,
StdQTConstants.codecFlagUpdatePrevious,
compressedImage);
The arguments to this call are:
QDGraphics src: the source image to compress.QDRect rect : what portion of that image to use.int flags: behavior flags. Among the most useful is
codecFlagUpdatePrevious, which is used for codecs that use
temporal compression. Another interesting option not needed here is
codecFlagLiveGrab, which you'd use if you were generating images
from a live source, possibly image capture, and needed to compress frames as
quickly as possible. In the typical QuickTime style, the desired flags are
mathematically OR'd together.RawEncodedImage: a RawEncodedImage into which the
compressed frame will be written. This is the object we made sure was big
enough with that getMaxCompressionSize() call earlier.The compressFrame call returns a
CompressedImageInfo object, which has an important method called
getSimilarity(). This value represents how similar the compressed
image is to the one compressed just before it. A value of 255 means the images
are identical. 0 means the compressed frame is to be a "key frame,"
meaning it has all the image data it needs, it does not depend on other frames,
and other frames may depend on it. Other values simply represent image
difference, where low values mean low similarity.
With the frame now compressed into the RawEncodedImage, we can
add a sample to the VideoMedia, with the addSample
method inherited from the Media superclass:
videoMedia.addSample (imageHandle,
0,
cfInfo.getDataSize(),
20,
imgDesc,
1,
(syncSample ?
0:
StdQTConstants.mediaSampleNotSync)
);
The arguments to this method are:
QTHandleRef data: a reference to the sample data; in this
case, to the RawEncodedImage.int dataOffset: an offset into the data. This is
0 in our case, since we're using all of the
RawEncodedImage that was populated by
compressFrame().int dataSize: the number of bytes of data,
starting at dataOffset, to use. Again, we're using the whole
RawEncodedImage.int durationPerSample: how long this sample lasts, expressed
in units of the media's timescale. Since our timescale is 600, a duration of 20
equals 1/30th of a second.SampleDescription sampleDesc: an object that tells the media
what to do with the sample data being passed in. This is why we got an
ImageDescription from the CSequence earlier.int numberOfSamples: the number of samples provided by this
call. For video, this is typically one frame. For other kinds of media, there
are some performance considerations described in the native docs.int sampleFlags: behavior flags. The interesting value here
is whether or not this is a "key frame," also known in QuickTime as a
"sync sample." We set the mediaSampleNotSync flag if
our earlier call to CompressedFrameInfo.getSimilarity() returned
non-zero. Note that failing to set this flag correctly is a popular cause of
movies that "blur" when scrubbed or played from any point other than
the first frame, as explained in an Apple tech note.Once the loop finishes, we do the same clean-up tasks as with the text-track samples in Part 1 -- declare that we're done editing and insert the media into the video track:
videoMedia.endEdits();
videoTrack.insertMedia (0, // trackStart
0, // mediaTime
videoMedia.getDuration(), // mediaDuration
1); // mediaRate
Finally, we save the movie to disk, exactly as before.
Here, for those with QuickTime 5 or 6, is a videotrack.mov movie produced by the sample code. If you recompile and re-run the code with different codecs and different sizes, you'll see some fairly dramatic differences in file size and image quality. I've used 160x120 to keep the file size small, in order to avoid abusing O'Reilly's bandwidth, and the compression artifacts here are more visible than in the 320x240 version.
Also remember that while we just copied a scaled section of an image into
the offscreen buffer, you can do any kind of imaging with this buffer before
compressing it into a frame. For example, you could do the drawing commands in
the QDGraphics class, or use the QTImageDrawer to use
Java 2D Graphics methods to draw into the QuickTime world. With some
bit-munging, you might even find a way to render 3D graphics from JOGL into QuickTime ... anyone up for rendering Finding Nemo directly into a QuickTime movie?
This completes our tour of QuickTime media structures, in which we've gone from the high-level view of what makes up a movie to the low-level mucking around with individual samples. This is a little "closer to the metal" than QTJ usually requires, but if you believe in keeping simple tasks easy and complex tasks possible, this has been an example of the latter.
Chris Adamson is an author, editor, and developer specializing in iPhone and Mac.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.