3D Modeling for Reconstructing Buildings from an Internet Photo Essay Example | Topics and Well Written Essays

3 D Modeling for Reconstructing Buildings from an Internet Photo Perspectives Introduction There are billions of photos in our World Wide Web. By far this is the largest photo collection ever assembled. Using this photo collection, it is possible come up with solutions such as 3D modeling, visualization, and localization for world sites among several other applications. To date, the myriad of opportunities that internet photos bring are hugely untapped. This owns to the fact that most of these images are in a form that is not amenable to processing. Most researchers who have attempted to work with this resource using conventional techniques are faced with countless problems. First, most of it is unorganized, uncalibrated, have uncontrolled illumination, image quality and resolution and is widely variable. In essence, coming up with a computer vision technique that can work with most of these images has proved to be a challenge for most researchers. Now how can researchers work with this huge resource; this paper proposes solution such as Image Based Rendering algorithm and Structure from Motion. While a few other researchers such as Brown and Lowe (Lowe 395) have used Structure from Motion to tackle the above problems, the technique used in this paper has several modifications. Structure from Motion is effective in 3D visualization and scene modeling and can operate on hundreds of images obtained from keyword queries (photo tourism). Through photo tourism, it is possible to reconstruct many world sites. In effect, an algorithm that can work effectively on internet photos can enable vital applications such as 3D visualization, communication/media sharing, and localization. Two recent breakthroughs in the field of computer vision namely Structure from Motion and Feature Matching will be the backbone of this paper. Through these techniques, it is possible to reconstruct buildings in 3D to offer virtual and interactive tours for internet users. You can also evaluate the current state of a building and identify degradation and areas that may require renovation or reconstruction. Further, we can come up with creations or display of any building of interest as long as we have its image. Sparse geometry and camera reconstruction The browsing and visualization components of this system requires exact information in regards to the orientation, relative location and inherent parameters like focal lengths for each photo in a collection and sparse three dimension scene geometry. The system also requires a geo-referenced coordinate frame. For the most part, this information can be obtained through electronic components and Global Positioning System gadgets over the internet. Image files in EXIF tags often have this data though the vast majority of these sources are mostly inaccurate. As such, this system will compute this data via computer vision techniques. First, we will detect feature points in every image after which the system will equate feature points between pairs of images. Finally, the system will run an iterative Structure from Motion procedure to retrieve the camera parameters. Since Structure from Motion procedure will only produce estimates and our system requires absolute values, the system will run iterative procedure to acquire better estimates. How this whole procedure unfolds is detailed below. Detecting feature points will be done using SIFT keypoint detector (Lowe 411). This technique has better invariance to image alteration. The next step is matching keypoint descriptors using the approximate bordering neighbors. For instance, if we want to match two images I and J, first we will create a kd-tree obtained from element descriptors in J. Next, for each element in I we will locate an adjacent neighbor in J using the kd-tree. For effectiveness, we can use ANN’s priority search algorithm. This technique limits each query to visit a maximum of two hundred bins in the kd-tree. Alternatively, we can use a technique described by Lowe (Lowe 95). In the technique, for each element descriptor in I, we find the two bordering neighbors in J, with distances d1 and d2. We will then accept the match if d1/d2is less than 0.6. If several elements in I match the same element in J, we do away with all of these matches since some of them must be specious. After toning the elements for an image pair (I, J), we will estimate a fundamental matrix for this duo using RANSAC. RANSAC is an iteration technique described by Fischler and Bolles (Fischler et al.390). In each RANSAC iteration, we will calculate a matrix using the 8-point algorithm as described by Hartley and Zisserman (Hartley et. al. 200). This technique normalizes the problem and improves robustness to noise. The RANSAC outlier will be set at a threshold of 0.6% of the maximum photo dimension. The F-matrix calculated using RANSAC will be refined by running the Levenberg-Marquardt algorithm (Nocedal et al. 97). This will minimize errors for the inliers and the F-matrix. In conclusion, we will get rid of matches that are outliers to the recovered F-matrix using this threshold. If the number of remaining matches is less than 20, we will do away with all of the matches in consideration. Once we have found consistent matches, the matches will be organized into tracks. A track is simply matching keypoints with several images. A track that will have several keypoints on one image is dimmed bogus. With good connection, we can now create an image connectivity graph where each image is a node and there exists an edge between a pair of photos with similar attributes. The next step is Structure from Motion. Recovering camera parameters like focal points, rotation, and translation is imperative for each photo. These parameters must be consistent i.e. projection errors should be minimized. This problem can be solved using bundle adjustment. We start by approximating the factors of a single duo of cameras. This preliminary pair should not only have a big number of matches, it must also have a large baseline. This will enable the initial two-frame rebuilding to be properly estimated. We therefore can select the duo of photos that has the biggest number of matches subject to the prerequisite that the matches cannot be well modeled by a single homography. This will help avoid deteriorate cases caused by coincident cameras. Homography can be achieved by matching photos using RANSAC. In this process, the outlier threshold should be 0.4% of image maximum dimensions. The next step is adding another camera to the optimization. We will choose a camera that can observe the biggest number of tracks and whose three dimention location has already been approximated. Next, we will need to initialize the new camera’s extrinsic factors using the direct linear transform technique inside a RANSAC procedure as explained by Hartley and Zisserman (Hartley eta.al 187). For the RANSAC step, the outlier threshold of 0.4% of image maximum dimension can be used. Apart from providing a good estimate for the camera’s translation and rotation, the direct linear transform technique can return an upper-triangular matrix K. This matrix can be used in estimating the camera’s intrinsic parameters. We will use matrix K with the focal length estimated from the EXIF tags of the photo. This will help us initialize the focal length of our new camera. From this original set of parameters, we compute a bundle adjustment step only allowing our new camera and the points it has observed to change. The rest of our model will remain fixed. The final step is adding points observed by the new camera into the optimization. If triangulating a point will gives a good estimate of its location and if at least one other recovered camera observed the point, it will be added. The estimate is conditioned by considering all pairs of rays that can facilitate triangulating that point, and also finding a duo of rays that has maximum angle of separation. If this angle is bigger than the threshold (in the experiment 2.0 degrees was used), then the point is triangulated. This check generally tends to reject points at infinity. While such points can be very useful in estimating precise camera rotations, they can occasionally cause problems. Using noisy camera parameters in triangulating points that are at infinity could results in points of erroneous, finite 3D locations. Once we have added new points, a global bundle adjustment is run. This is necessary in refining the whole model. The sparse bundle adjustment library detailed by Lourakis and Argyros is essential in finding the minimum error solution. This procedure is continual, one camera at a time, until all the remaining cameras observe enough reconstructed 3D points to be reliably reconstructed. Typically, these entire processes may take three hour to twelve days depending on the complexity of images being considered. This necessitates procuring a few adjustments to speed up the process. This system was applied to several buildings with; the Colosseum in Rome, the Great Wall of China, Prague, and the Notre Dame in Paris are just some of the few buildings this system has been able to reconstruct in 3D. For the Notre Dame 2635 photos were used and it took about 12 days to complete the whole process. For the Colosseum, 1994 photos were used and it took about 5 days to complete the whole process. For the Great Wall of China, 120 photos were used and it took about 3hrs to complete the whole process. For the Prague, 197 photos were used and it took about 3hrs to complete the whole process. The Structure from Motion procedure estimates virtual camera locations. The final process of the location approximation is to alternatively align the copy with a geo-referenced photo or map, a satellite image, digital elevation map or a floor plan. This helps determine the fixed geocentric coordinates of every camera. Most features of the image explorer can work with relative coordinates, while others may require absolute coordinates. In theory, the approximated camera locations are related to the absolute locations. To establish a correct transformation, the user interactively translates, rotates, and scales the prototype until it is in conformity with the provided map or photo. The 3D points, camera locations, and lines are then superimposed on the alignment photo. Through orthographic outcrop, the camera is placed above the scene and pointed downwards. If the up vector is estimated properly, the user only needs to rotate the prototype in 2D. Sometimes, the recovered scene may not be aligned to a geo-referenced coordinate system through a similarity transform. This happens if the Structure from Motion process cannot obtain a metric reconstruction of the scene. It can also happen as a result of low-frequency drift in the recovery point and camera locations. Generally, these error sources do not have a huge impact on the navigation controls used since they are small and unnoticeable. They may however prove problematic if accurate models are required. You can straighten out the acquired scene by pinning down a light set of control points. Alternatively, you can manually identify correspondences between cameras or points and locations in a photo or map. Challenges faced in research and suggestions for improvement There are many challenges that face Structure from Motion. As such, more research work is required in this fields. Many world sites, cities, towns, building etc are being photographed every day. With all this photos, it is now possible to reconstruct a significant portion using Structure from Motion though we require reconstruction algorithms and internet-scale matching that can operate on billions of images. Some recent matching algorithms have been able to achieve this though there is room for improvement. Advancement in this area will help improve the accuracy of Structure from Motion especially in collection of images. Also, the fact that there is a huge redundancy of images in the internet means that just a small portion of this images can be used to come up with quality reconstruction. Structure from Motion provides only meager geometry. Computing accurate camera parameters will open doors to techniques like multi view stereo that can calculate dense surface shape models. Though some advancement in online algorithm has been achieved, there is still room for improvement. Another key challenge that faces Structure from Motion is variability. Whereas SIFT and other matching techniques are robust with respect to appearance variation, more momentous appearance changes still pose a huge problem. Using robust matching algorithms can enable a number of capabilities like matching satellite/aerial views to ground-based, aligning photos of natural sites by changes in seasons or weather patterns, registering past images and artistic delivery with modern views and robust corresponding using low-quality devices and imagery. Pros and cons of the system Structure from Motion has some key advantages and disadvantages. Using this technique, more rapid or even real time results can be obtained. Just imagine being able to point your mobile device and automatically be notified of your location while being annotated with information about the objects in your vicinity. This technique also offer a good level of accuracy which is vital in navigation, localization, and surveillance. Most methods used in Structure from Motion operate by reducing re-projection error but may not guarantee metric accuracy. However, there are several maps, satellite images, DEMs, etc that provide a rich source of metric data. Using such data, accurate metric Structure from Motion results can be obtained. This system enables users to add material to a scene using a number of ways. The user can register their own photos to the scene at a run-time once the initial set of photographs are registered. Secondly, a user can add information about regions of the images they have delivered. The annotations can further be propagated to the photographs. On the downside, specific equipment, software, and personnel are required to procure Structure from Motion. This can prove to be expensive and prohibitive to smaller installations. 3D scene visualization is one of the most fascinating aspects of reconstruction. However, the meager points that are produced by Structure from Motion may not produce views that are compelling. Conclusion and evaluation of work In this paper, we delved into how Structure from Motion works. The paper first delved on identifying problems faced by researchers when using photos to come up with 3D visualization. Next, we proposed solutions to the inherent problems and how Structure from Motion technique can be used to solve these problems. The paper also discussed some of the challenges faced in research and suggestion to further improve technique. We further outlined a few advantages and disadvantages of this technique. Internet imaging provides prolific foundation for computer vision research. The huge amount of data on the World Wide Web is starting to be used to address several inherent problems in computer graphics, vision, and object category recognition. We should expect major strides in this field in the next few years. Based on how well I have gone through this paper, a grade A should suffice. I have addressed all the aspects of the paper including but not limited to problem description and solution; critique on proposed solution; clarity and readability of paper and conclusion. I have also been able to look at other papers that relate to this field. Work cited Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395. Hartley, R. I. (1997). In defense of the eight-point algorithm. IEEETransactions on Pattern Analysis and Machine Intelligence, 19(6), 580–593. Horn, Berthold. Robot Vision. Cambridge, Mass: MIT Press, 1986. Print. Levoy, M., & Hanrahan, P. (1996). Light field rendering. InSIG-GRAPH conference proceedings(pp. 31–42). Lowe, D. (2004). Distinctive image features from scale-invariant key-points.International Journal of Computer Vision,60(2), 91–110. Nocedal, J., & Wright, S. J. (1999).Springer series in operations re-search. Numerical optimization. New York: Springer. Noah Snavely , Steven M. Seitz, and Richard Szeliski. Modeling the world from internet photo collections. International Journal of Computer Vision. Nocedal, J., & Wright, S. J. (1999).Springer series in operations re-search. Numerical optimization. New York: Springer Toyama, K., Logan, R., & Roseway, A. (2003). Geographic location tags on digital images. In Proceedings of the international conference on multimedia Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence. Read More

3D Modeling for Reconstructing Buildings from an Internet Photo Perspectives - Essay Example

Extract of sample "3D Modeling for Reconstructing Buildings from an Internet Photo Perspectives"

CHECK THESE SAMPLES OF 3D Modeling for Reconstructing Buildings from an Internet Photo Perspectives

Building Information Modeling

Data Acquisition and 3D Modelling

Building Information Modeling as Perspective the Engineering Tool

Methods of Facial Reconstruction

Building Information Modelling in Engineering

Art, Design and Education: Mock Grant Project

The Analysis of 3D-Modeling Technology

Building Information Modeling and Integrated Project Delivery