How to run a saved TensorFlow Model? (Video Prediction Model)
$begingroup$
I was reading this paper. The code is available in GitHub. In the README.md
file, they've mentioned how to train the model
python prediction_train.py
with optional parameters.
Can anyone please explain how do I use this model to predict a video sequence? I'm new to deeplearning and tensorflow. So, I'm not able to understand the code properly. My current task is to just run the code and see the output (i.e. the videos predicted by this model).
All I could understand was, it uses tensirflow saver to save the checkpoints. I'm guessing these checkpoints are the intermediate trained model after few epochs (2000 in this case). How to use these models to predict the next frames of a video?
Any help is greatly appreciated :)
deep-learning tensorflow
$endgroup$
add a comment |
$begingroup$
I was reading this paper. The code is available in GitHub. In the README.md
file, they've mentioned how to train the model
python prediction_train.py
with optional parameters.
Can anyone please explain how do I use this model to predict a video sequence? I'm new to deeplearning and tensorflow. So, I'm not able to understand the code properly. My current task is to just run the code and see the output (i.e. the videos predicted by this model).
All I could understand was, it uses tensirflow saver to save the checkpoints. I'm guessing these checkpoints are the intermediate trained model after few epochs (2000 in this case). How to use these models to predict the next frames of a video?
Any help is greatly appreciated :)
deep-learning tensorflow
$endgroup$
1
$begingroup$
You can start by modifying theprediction_train.py
into your own script. You need to load a pretrained model by providing the model path in line 48; then run thegen_images
op instead of thetrain_op
orsumm_op
to get the predicted images.
$endgroup$
– user12075
Jan 10 at 18:37
add a comment |
$begingroup$
I was reading this paper. The code is available in GitHub. In the README.md
file, they've mentioned how to train the model
python prediction_train.py
with optional parameters.
Can anyone please explain how do I use this model to predict a video sequence? I'm new to deeplearning and tensorflow. So, I'm not able to understand the code properly. My current task is to just run the code and see the output (i.e. the videos predicted by this model).
All I could understand was, it uses tensirflow saver to save the checkpoints. I'm guessing these checkpoints are the intermediate trained model after few epochs (2000 in this case). How to use these models to predict the next frames of a video?
Any help is greatly appreciated :)
deep-learning tensorflow
$endgroup$
I was reading this paper. The code is available in GitHub. In the README.md
file, they've mentioned how to train the model
python prediction_train.py
with optional parameters.
Can anyone please explain how do I use this model to predict a video sequence? I'm new to deeplearning and tensorflow. So, I'm not able to understand the code properly. My current task is to just run the code and see the output (i.e. the videos predicted by this model).
All I could understand was, it uses tensirflow saver to save the checkpoints. I'm guessing these checkpoints are the intermediate trained model after few epochs (2000 in this case). How to use these models to predict the next frames of a video?
Any help is greatly appreciated :)
deep-learning tensorflow
deep-learning tensorflow
edited Jan 10 at 18:09
user12075
1,276515
1,276515
asked Jan 10 at 5:57
Nagabhushan S NNagabhushan S N
1276
1276
1
$begingroup$
You can start by modifying theprediction_train.py
into your own script. You need to load a pretrained model by providing the model path in line 48; then run thegen_images
op instead of thetrain_op
orsumm_op
to get the predicted images.
$endgroup$
– user12075
Jan 10 at 18:37
add a comment |
1
$begingroup$
You can start by modifying theprediction_train.py
into your own script. You need to load a pretrained model by providing the model path in line 48; then run thegen_images
op instead of thetrain_op
orsumm_op
to get the predicted images.
$endgroup$
– user12075
Jan 10 at 18:37
1
1
$begingroup$
You can start by modifying the
prediction_train.py
into your own script. You need to load a pretrained model by providing the model path in line 48; then run the gen_images
op instead of the train_op
or summ_op
to get the predicted images.$endgroup$
– user12075
Jan 10 at 18:37
$begingroup$
You can start by modifying the
prediction_train.py
into your own script. You need to load a pretrained model by providing the model path in line 48; then run the gen_images
op instead of the train_op
or summ_op
to get the predicted images.$endgroup$
– user12075
Jan 10 at 18:37
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
The TensorFlow saver is used to save the weights of a specific model at some given point. When you want to use a trained model, you must first define the model's architecture (which should be similar to the one used for saving the weights), then you can use the same "saver" class to restore the weights:
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "../my_saved_model.ckpt")
Regarding your initial question. I think that if you are just starting with deep learning and TensorFlow, this is the wrong place to start and you should first understand how TensorFlow works in general by applying it at easier tasks like image classification (start with MNIST).
From what I understand, you need to use the "construct_model" function and pass it your initial image sequence (video) and some action tensor, and it should output a predicted frames:
def construct_model(images,
actions=None,
states=None,
iter_num=-1.0,
k=-1,
use_state=True,
num_masks=10,
stp=False,
cdna=True,
dna=False,
context_frames=2):
"""Build convolutional lstm video predictor using STP, CDNA, or DNA.
Args:
images: tensor of ground truth image sequences
actions: tensor of action sequences
states: tensor of ground truth state sequences
iter_num: tensor of the current training iteration (for sched. sampling)
k: constant used for scheduled sampling. -1 to feed in own prediction.
use_state: True to include state and action in prediction
num_masks: the number of different pixel motion predictions (and
the number of masks for each of those predictions)
stp: True to use Spatial Transformer Predictor (STP)
cdna: True to use Convoluational Dynamic Neural Advection (CDNA)
dna: True to use Dynamic Neural Advection (DNA)
context_frames: number of ground truth frames to pass in before
feeding in own predictions
Returns:
gen_images: predicted future image frames
gen_states: predicted future states
$endgroup$
$begingroup$
Thanks! And yes, I'm also parallelly reading neural networks used to classify MNIST digits from here: neuralnetworksanddeeplearning.com by Michael Nielsen. But again that takes some time. My guide has given me this task. So, I have to do this now :)
$endgroup$
– Nagabhushan S N
Jan 11 at 5:52
add a comment |
$begingroup$
The Prediction in video is too Easy..Use Opencv for video to read video into images and store them in an numpy array.Later you will load your tf model and then do some prediction .
"""
Sections of this code were taken from:
https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb
"""
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from utils import label_map_util
from utils import visualization_utils as vis_util
import cv2
# Path to frozen detection graph. This is the actual model that is used
# for the object detection.
PATH_TO_CKPT = '../freezed_pb5_optimized/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('../../../training', 'object-detection.pbtxt')
NUM_CLASSES = 1
sys.path.append("..")
def detect_in_video():
# VideoWriter is the responsible of creating a copy of the video
# used for the detections but with the detections overlays. Keep in
# mind the frame size has to be the same as original video.
out = cv2.VideoWriter('pikachu_detection_1v3.avi', cv2.VideoWriter_fourcc(
'M', 'J', 'P', 'G'), 10, (1280, 720))
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(
label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object
# was detected.
detection_boxes = detection_graph.get_tensor_by_name(
'detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class
# label.
detection_scores = detection_graph.get_tensor_by_name(
'detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name(
'detection_classes:0')
num_detections = detection_graph.get_tensor_by_name(
'num_detections:0')
cap = cv2.VideoCapture('PikachuKetchup.mp4')
while(cap.isOpened()):
# Read the frame
ret, frame = cap.read()
# Recolor the frame. By default, OpenCV uses BGR color space.
# This short blog post explains this better:
# https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/
color_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image_np_expanded = np.expand_dims(color_frame, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores,
detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
# note: perform the detections using a higher threshold
vis_util.visualize_boxes_and_labels_on_image_array(
color_frame,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=.20)
cv2.imshow('frame', color_frame)
output_rgb = cv2.cvtColor(color_frame, cv2.COLOR_RGB2BGR)
out.write(output_rgb)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
out.release()
cap.release()
cv2.destroyAllWindows()
def main():
detect_in_video()
if __name__ == '__main__':
main()
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f43756%2fhow-to-run-a-saved-tensorflow-model-video-prediction-model%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The TensorFlow saver is used to save the weights of a specific model at some given point. When you want to use a trained model, you must first define the model's architecture (which should be similar to the one used for saving the weights), then you can use the same "saver" class to restore the weights:
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "../my_saved_model.ckpt")
Regarding your initial question. I think that if you are just starting with deep learning and TensorFlow, this is the wrong place to start and you should first understand how TensorFlow works in general by applying it at easier tasks like image classification (start with MNIST).
From what I understand, you need to use the "construct_model" function and pass it your initial image sequence (video) and some action tensor, and it should output a predicted frames:
def construct_model(images,
actions=None,
states=None,
iter_num=-1.0,
k=-1,
use_state=True,
num_masks=10,
stp=False,
cdna=True,
dna=False,
context_frames=2):
"""Build convolutional lstm video predictor using STP, CDNA, or DNA.
Args:
images: tensor of ground truth image sequences
actions: tensor of action sequences
states: tensor of ground truth state sequences
iter_num: tensor of the current training iteration (for sched. sampling)
k: constant used for scheduled sampling. -1 to feed in own prediction.
use_state: True to include state and action in prediction
num_masks: the number of different pixel motion predictions (and
the number of masks for each of those predictions)
stp: True to use Spatial Transformer Predictor (STP)
cdna: True to use Convoluational Dynamic Neural Advection (CDNA)
dna: True to use Dynamic Neural Advection (DNA)
context_frames: number of ground truth frames to pass in before
feeding in own predictions
Returns:
gen_images: predicted future image frames
gen_states: predicted future states
$endgroup$
$begingroup$
Thanks! And yes, I'm also parallelly reading neural networks used to classify MNIST digits from here: neuralnetworksanddeeplearning.com by Michael Nielsen. But again that takes some time. My guide has given me this task. So, I have to do this now :)
$endgroup$
– Nagabhushan S N
Jan 11 at 5:52
add a comment |
$begingroup$
The TensorFlow saver is used to save the weights of a specific model at some given point. When you want to use a trained model, you must first define the model's architecture (which should be similar to the one used for saving the weights), then you can use the same "saver" class to restore the weights:
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "../my_saved_model.ckpt")
Regarding your initial question. I think that if you are just starting with deep learning and TensorFlow, this is the wrong place to start and you should first understand how TensorFlow works in general by applying it at easier tasks like image classification (start with MNIST).
From what I understand, you need to use the "construct_model" function and pass it your initial image sequence (video) and some action tensor, and it should output a predicted frames:
def construct_model(images,
actions=None,
states=None,
iter_num=-1.0,
k=-1,
use_state=True,
num_masks=10,
stp=False,
cdna=True,
dna=False,
context_frames=2):
"""Build convolutional lstm video predictor using STP, CDNA, or DNA.
Args:
images: tensor of ground truth image sequences
actions: tensor of action sequences
states: tensor of ground truth state sequences
iter_num: tensor of the current training iteration (for sched. sampling)
k: constant used for scheduled sampling. -1 to feed in own prediction.
use_state: True to include state and action in prediction
num_masks: the number of different pixel motion predictions (and
the number of masks for each of those predictions)
stp: True to use Spatial Transformer Predictor (STP)
cdna: True to use Convoluational Dynamic Neural Advection (CDNA)
dna: True to use Dynamic Neural Advection (DNA)
context_frames: number of ground truth frames to pass in before
feeding in own predictions
Returns:
gen_images: predicted future image frames
gen_states: predicted future states
$endgroup$
$begingroup$
Thanks! And yes, I'm also parallelly reading neural networks used to classify MNIST digits from here: neuralnetworksanddeeplearning.com by Michael Nielsen. But again that takes some time. My guide has given me this task. So, I have to do this now :)
$endgroup$
– Nagabhushan S N
Jan 11 at 5:52
add a comment |
$begingroup$
The TensorFlow saver is used to save the weights of a specific model at some given point. When you want to use a trained model, you must first define the model's architecture (which should be similar to the one used for saving the weights), then you can use the same "saver" class to restore the weights:
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "../my_saved_model.ckpt")
Regarding your initial question. I think that if you are just starting with deep learning and TensorFlow, this is the wrong place to start and you should first understand how TensorFlow works in general by applying it at easier tasks like image classification (start with MNIST).
From what I understand, you need to use the "construct_model" function and pass it your initial image sequence (video) and some action tensor, and it should output a predicted frames:
def construct_model(images,
actions=None,
states=None,
iter_num=-1.0,
k=-1,
use_state=True,
num_masks=10,
stp=False,
cdna=True,
dna=False,
context_frames=2):
"""Build convolutional lstm video predictor using STP, CDNA, or DNA.
Args:
images: tensor of ground truth image sequences
actions: tensor of action sequences
states: tensor of ground truth state sequences
iter_num: tensor of the current training iteration (for sched. sampling)
k: constant used for scheduled sampling. -1 to feed in own prediction.
use_state: True to include state and action in prediction
num_masks: the number of different pixel motion predictions (and
the number of masks for each of those predictions)
stp: True to use Spatial Transformer Predictor (STP)
cdna: True to use Convoluational Dynamic Neural Advection (CDNA)
dna: True to use Dynamic Neural Advection (DNA)
context_frames: number of ground truth frames to pass in before
feeding in own predictions
Returns:
gen_images: predicted future image frames
gen_states: predicted future states
$endgroup$
The TensorFlow saver is used to save the weights of a specific model at some given point. When you want to use a trained model, you must first define the model's architecture (which should be similar to the one used for saving the weights), then you can use the same "saver" class to restore the weights:
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess, "../my_saved_model.ckpt")
Regarding your initial question. I think that if you are just starting with deep learning and TensorFlow, this is the wrong place to start and you should first understand how TensorFlow works in general by applying it at easier tasks like image classification (start with MNIST).
From what I understand, you need to use the "construct_model" function and pass it your initial image sequence (video) and some action tensor, and it should output a predicted frames:
def construct_model(images,
actions=None,
states=None,
iter_num=-1.0,
k=-1,
use_state=True,
num_masks=10,
stp=False,
cdna=True,
dna=False,
context_frames=2):
"""Build convolutional lstm video predictor using STP, CDNA, or DNA.
Args:
images: tensor of ground truth image sequences
actions: tensor of action sequences
states: tensor of ground truth state sequences
iter_num: tensor of the current training iteration (for sched. sampling)
k: constant used for scheduled sampling. -1 to feed in own prediction.
use_state: True to include state and action in prediction
num_masks: the number of different pixel motion predictions (and
the number of masks for each of those predictions)
stp: True to use Spatial Transformer Predictor (STP)
cdna: True to use Convoluational Dynamic Neural Advection (CDNA)
dna: True to use Dynamic Neural Advection (DNA)
context_frames: number of ground truth frames to pass in before
feeding in own predictions
Returns:
gen_images: predicted future image frames
gen_states: predicted future states
answered Jan 10 at 15:58
Mark.FMark.F
766218
766218
$begingroup$
Thanks! And yes, I'm also parallelly reading neural networks used to classify MNIST digits from here: neuralnetworksanddeeplearning.com by Michael Nielsen. But again that takes some time. My guide has given me this task. So, I have to do this now :)
$endgroup$
– Nagabhushan S N
Jan 11 at 5:52
add a comment |
$begingroup$
Thanks! And yes, I'm also parallelly reading neural networks used to classify MNIST digits from here: neuralnetworksanddeeplearning.com by Michael Nielsen. But again that takes some time. My guide has given me this task. So, I have to do this now :)
$endgroup$
– Nagabhushan S N
Jan 11 at 5:52
$begingroup$
Thanks! And yes, I'm also parallelly reading neural networks used to classify MNIST digits from here: neuralnetworksanddeeplearning.com by Michael Nielsen. But again that takes some time. My guide has given me this task. So, I have to do this now :)
$endgroup$
– Nagabhushan S N
Jan 11 at 5:52
$begingroup$
Thanks! And yes, I'm also parallelly reading neural networks used to classify MNIST digits from here: neuralnetworksanddeeplearning.com by Michael Nielsen. But again that takes some time. My guide has given me this task. So, I have to do this now :)
$endgroup$
– Nagabhushan S N
Jan 11 at 5:52
add a comment |
$begingroup$
The Prediction in video is too Easy..Use Opencv for video to read video into images and store them in an numpy array.Later you will load your tf model and then do some prediction .
"""
Sections of this code were taken from:
https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb
"""
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from utils import label_map_util
from utils import visualization_utils as vis_util
import cv2
# Path to frozen detection graph. This is the actual model that is used
# for the object detection.
PATH_TO_CKPT = '../freezed_pb5_optimized/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('../../../training', 'object-detection.pbtxt')
NUM_CLASSES = 1
sys.path.append("..")
def detect_in_video():
# VideoWriter is the responsible of creating a copy of the video
# used for the detections but with the detections overlays. Keep in
# mind the frame size has to be the same as original video.
out = cv2.VideoWriter('pikachu_detection_1v3.avi', cv2.VideoWriter_fourcc(
'M', 'J', 'P', 'G'), 10, (1280, 720))
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(
label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object
# was detected.
detection_boxes = detection_graph.get_tensor_by_name(
'detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class
# label.
detection_scores = detection_graph.get_tensor_by_name(
'detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name(
'detection_classes:0')
num_detections = detection_graph.get_tensor_by_name(
'num_detections:0')
cap = cv2.VideoCapture('PikachuKetchup.mp4')
while(cap.isOpened()):
# Read the frame
ret, frame = cap.read()
# Recolor the frame. By default, OpenCV uses BGR color space.
# This short blog post explains this better:
# https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/
color_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image_np_expanded = np.expand_dims(color_frame, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores,
detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
# note: perform the detections using a higher threshold
vis_util.visualize_boxes_and_labels_on_image_array(
color_frame,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=.20)
cv2.imshow('frame', color_frame)
output_rgb = cv2.cvtColor(color_frame, cv2.COLOR_RGB2BGR)
out.write(output_rgb)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
out.release()
cap.release()
cv2.destroyAllWindows()
def main():
detect_in_video()
if __name__ == '__main__':
main()
$endgroup$
add a comment |
$begingroup$
The Prediction in video is too Easy..Use Opencv for video to read video into images and store them in an numpy array.Later you will load your tf model and then do some prediction .
"""
Sections of this code were taken from:
https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb
"""
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from utils import label_map_util
from utils import visualization_utils as vis_util
import cv2
# Path to frozen detection graph. This is the actual model that is used
# for the object detection.
PATH_TO_CKPT = '../freezed_pb5_optimized/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('../../../training', 'object-detection.pbtxt')
NUM_CLASSES = 1
sys.path.append("..")
def detect_in_video():
# VideoWriter is the responsible of creating a copy of the video
# used for the detections but with the detections overlays. Keep in
# mind the frame size has to be the same as original video.
out = cv2.VideoWriter('pikachu_detection_1v3.avi', cv2.VideoWriter_fourcc(
'M', 'J', 'P', 'G'), 10, (1280, 720))
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(
label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object
# was detected.
detection_boxes = detection_graph.get_tensor_by_name(
'detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class
# label.
detection_scores = detection_graph.get_tensor_by_name(
'detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name(
'detection_classes:0')
num_detections = detection_graph.get_tensor_by_name(
'num_detections:0')
cap = cv2.VideoCapture('PikachuKetchup.mp4')
while(cap.isOpened()):
# Read the frame
ret, frame = cap.read()
# Recolor the frame. By default, OpenCV uses BGR color space.
# This short blog post explains this better:
# https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/
color_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image_np_expanded = np.expand_dims(color_frame, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores,
detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
# note: perform the detections using a higher threshold
vis_util.visualize_boxes_and_labels_on_image_array(
color_frame,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=.20)
cv2.imshow('frame', color_frame)
output_rgb = cv2.cvtColor(color_frame, cv2.COLOR_RGB2BGR)
out.write(output_rgb)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
out.release()
cap.release()
cv2.destroyAllWindows()
def main():
detect_in_video()
if __name__ == '__main__':
main()
$endgroup$
add a comment |
$begingroup$
The Prediction in video is too Easy..Use Opencv for video to read video into images and store them in an numpy array.Later you will load your tf model and then do some prediction .
"""
Sections of this code were taken from:
https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb
"""
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from utils import label_map_util
from utils import visualization_utils as vis_util
import cv2
# Path to frozen detection graph. This is the actual model that is used
# for the object detection.
PATH_TO_CKPT = '../freezed_pb5_optimized/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('../../../training', 'object-detection.pbtxt')
NUM_CLASSES = 1
sys.path.append("..")
def detect_in_video():
# VideoWriter is the responsible of creating a copy of the video
# used for the detections but with the detections overlays. Keep in
# mind the frame size has to be the same as original video.
out = cv2.VideoWriter('pikachu_detection_1v3.avi', cv2.VideoWriter_fourcc(
'M', 'J', 'P', 'G'), 10, (1280, 720))
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(
label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object
# was detected.
detection_boxes = detection_graph.get_tensor_by_name(
'detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class
# label.
detection_scores = detection_graph.get_tensor_by_name(
'detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name(
'detection_classes:0')
num_detections = detection_graph.get_tensor_by_name(
'num_detections:0')
cap = cv2.VideoCapture('PikachuKetchup.mp4')
while(cap.isOpened()):
# Read the frame
ret, frame = cap.read()
# Recolor the frame. By default, OpenCV uses BGR color space.
# This short blog post explains this better:
# https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/
color_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image_np_expanded = np.expand_dims(color_frame, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores,
detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
# note: perform the detections using a higher threshold
vis_util.visualize_boxes_and_labels_on_image_array(
color_frame,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=.20)
cv2.imshow('frame', color_frame)
output_rgb = cv2.cvtColor(color_frame, cv2.COLOR_RGB2BGR)
out.write(output_rgb)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
out.release()
cap.release()
cv2.destroyAllWindows()
def main():
detect_in_video()
if __name__ == '__main__':
main()
$endgroup$
The Prediction in video is too Easy..Use Opencv for video to read video into images and store them in an numpy array.Later you will load your tf model and then do some prediction .
"""
Sections of this code were taken from:
https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb
"""
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from utils import label_map_util
from utils import visualization_utils as vis_util
import cv2
# Path to frozen detection graph. This is the actual model that is used
# for the object detection.
PATH_TO_CKPT = '../freezed_pb5_optimized/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('../../../training', 'object-detection.pbtxt')
NUM_CLASSES = 1
sys.path.append("..")
def detect_in_video():
# VideoWriter is the responsible of creating a copy of the video
# used for the detections but with the detections overlays. Keep in
# mind the frame size has to be the same as original video.
out = cv2.VideoWriter('pikachu_detection_1v3.avi', cv2.VideoWriter_fourcc(
'M', 'J', 'P', 'G'), 10, (1280, 720))
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(
label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object
# was detected.
detection_boxes = detection_graph.get_tensor_by_name(
'detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class
# label.
detection_scores = detection_graph.get_tensor_by_name(
'detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name(
'detection_classes:0')
num_detections = detection_graph.get_tensor_by_name(
'num_detections:0')
cap = cv2.VideoCapture('PikachuKetchup.mp4')
while(cap.isOpened()):
# Read the frame
ret, frame = cap.read()
# Recolor the frame. By default, OpenCV uses BGR color space.
# This short blog post explains this better:
# https://www.learnopencv.com/why-does-opencv-use-bgr-color-format/
color_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
image_np_expanded = np.expand_dims(color_frame, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores,
detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
# note: perform the detections using a higher threshold
vis_util.visualize_boxes_and_labels_on_image_array(
color_frame,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=8,
min_score_thresh=.20)
cv2.imshow('frame', color_frame)
output_rgb = cv2.cvtColor(color_frame, cv2.COLOR_RGB2BGR)
out.write(output_rgb)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
out.release()
cap.release()
cv2.destroyAllWindows()
def main():
detect_in_video()
if __name__ == '__main__':
main()
answered 1 hour ago
Tamil Selvan STamil Selvan S
112
112
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f43756%2fhow-to-run-a-saved-tensorflow-model-video-prediction-model%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
You can start by modifying the
prediction_train.py
into your own script. You need to load a pretrained model by providing the model path in line 48; then run thegen_images
op instead of thetrain_op
orsumm_op
to get the predicted images.$endgroup$
– user12075
Jan 10 at 18:37