Categories
Capuchino Research Ý tưởng

[Deep Learning] Deploy flowers detection

Tình hình là hồi lâu có vọc thử cái vụ transfer learning để train máy nhận diện 5 loại hoa.

Sau đó mình muốn xây dựng thành một web app để mọi người có thể sử dụng được, cuối cùng mình quyết định dùng luôn flash python để xây web app. Đây là kết quả ^^

http://rose.dothanhlong.org:999/flower_detection

https://www.youtube.com/watch?v=DFUNUWa_aEA
]]>
Categories
Sói's Tutorials Tài liệu

AngularJS IndexedDB Cheatset by soiqualang :v

angularjs_indexedDB

My git repo

https://github.com/soiqualang/angularjs_indexedDB

angularjs_indexedDB

This is my note to work with indexedDB

I used lib from here:

https://github.com/bramski/angular-indexedDB

AngularJS IndexedDB Cheatset

Declare App

var app = angular.module('myApp', ['indexedDB']);

Create Database

db.createObjectStore(storeName, options);

Declare a database

store.createIndex(indexName, keyPath, options);

Declare index for search able fields

app.config(function ($indexedDBProvider) {
$indexedDBProvider
  .connection('myIndexedDB')
  .upgradeDatabase(1, function(event, db, tx){
	var objStore = db.createObjectStore('people', {keyPath: 'id',autoIncrement: true});
	//Primary is id
	//primary is autoIncrement
	//store.createIndex(indexName, keyPath, options);
	objStore.createIndex('name_idx', 'name', {unique: false});
	objStore.createIndex('country_idx', 'country', {unique: false});
  });
});

Insert

Insert single record

$indexedDB.openStore('people', function(store){
	store.insert({"name": "John Doe2", "country": "Norway"}).then(function(e){
		console.log('Da insert');
	});
});

Insert an array

$indexedDB.openStore('people', function(store){
	var names = [
		{name:'Jani',country:'Norway'},
		{name:'Carl',country:'Sweden'},
		{name:'Margareth',country:'England'},
		{name:'Hege',country:'Norway'},
		{name:'Joe',country:'Denmark'},
		{name:'Gustav',country:'Sweden'},
		{name:'Birgit',country:'Denmark'},
		{name:'Mary',country:'England'},
		{name:'Kai',country:'Norway'}
	];
	store.insert(names).then(function(e){
		// do something
	});
});

Select all records

$indexedDB.openStore('people', function(store){
	store.getAll().then(function(peoples) {
		// Assign vaiable to scope
		$scope.objects = peoples;
	});
});

Get all key

$indexedDB.openStore('people', function(store){
	store.getAllKeys().then(function(e){
		$scope.primaryKeys = e;
	});
});

Delete

Delete all

$indexedDB.openStore('people', function(store){
	store.clear().then(function(){
		// do something
	});
});

Delete a record

$indexedDB.openStore('people', function(store){
	//delete obj id=2
	store.delete(2).then(function(){
		// do something
	});
});

Count records

$indexedDB.openStore('people', function(peoples){
	peoples.count().then(function(e){
		console.log(e);
	});
});

Search – Find

Search item where id=9

$indexedDB.openStore('people', function(peoples){
	peoples.find(9).then(function(e){
		console.log(e);
	});
});	

Build a query

  • $lt(value) – less than
  • $gt(value) – greater than
  • $lte(value) – less than or equal
  • $gte(value) – greater than or equal
  • $eq(value) – equal
  • $between(lower, upper, doNotIncludeLowerBound? true/false, doNotIncludeUpperBound true/false) – between two bounds
  • $desc(unique) – descending order
  • $asc(unique) – ascending order
  • $index(value) – name of index

Find all records with country = “Norway”

$indexedDB.openStore('people', function(peoples){
	var find = peoples.query();
	//find=find.$eq(5);
	//find=find.$lt(3);
	//console.log(find);
	//find = find.$index("name_idx");
	//Find in country, key = Norway
	find=find.$index("country_idx");
	find=find.$eq("Norway");
	//update scope
	peoples.eachWhere(find).then(function(e){
		console.log(e);
		$scope.people = e;
	});
});

That’s all, hope it useful for you^^

(But I’m still finding a way to help search text like .. like "%blabla%" @@)


https://itnext.io/indexeddb-your-second-step-towards-progressive-web-apps-pwa-dcbcd6cc2076

https://github.com/bramski/angular-indexedDB

https://gist.github.com/inexorabletash/a279f03ab5610817c0540c83857e4295

]]>
Categories
Sói's Tutorials

Học AngularJS

Mấy nay có việc nên quyết tâm học AngularJs để chuyển đổi công nghệ render page từ back-end ra font-end.

Do vậy mình có tạo cái repo github này, tài liệu thì W3School mà thẳng tiến =)). Ai thích học thì học chung cho vui ^^


https://github.com/soiqualang/Learn_AngularJS


https://soiqualang.github.io/Learn_AngularJS/

Về cơ bản thì mình thấy thằng AngularJs giúp mình đỡ công viết script quá chừng, một số hình ảnh trong quá trình vọc :p

]]>
Categories
Công nghệ Linux Tài liệu

Cấu hình Google colab

Cấu hình Google colab

https://colab.research.google.com

Thông tin máy chủ Google Colab

Test tốc độ internet

Hôm nay đang bị đứt cáp quang :'(

soiqualang_chentreu

]]>
Categories
Ý tưởng

[Ý tưởng] Xây dựng app tra cứu 2000 phác đồ điều trị bệnh

[Ý tưởng] Xây dựng app tra cứu 2000 phác đồ điều trị bệnh

Hồi trước mình có viết 1 cái tool để crawl data gần 2000 phác đồ điều trị bệnh của các bệnh viện lớn trong cả nước.

Mình có chia sẻ ở đây

Tự nhiên có cái ý từ dữ liệu này làm một cái app cho mọi người tra cứu các phác đồ điều trị bệnh này để tham khảo^^ P/s:…

Posted by Long Đỗ on Saturday, May 18, 2019

Tuy nhiên, hồi đó mình ghi ra Word do vậy rất nặng, hơn 8000 trang. Mỗi lần mở hay tìm kiếm rất là cực luôn @@!

(Nghĩ lại mới thấy hồi đó ngu quá :v )

Nay, hồi tuần rồi Facebook nhắc, mình mới chợt có ý tưởng Xây dựng app tra cứu 2000 phác đồ điều trị bệnh của các bệnh viện lớn trong cả nước cho mọi người dễ dùng, dễ chia sẻ :v

Chuẩn bị data

Vấn đề là hồi trước mình làm chỉ với mục đích xuất ra Word cho mọi người dễ dùng. Tuy nhiên giờ mới thấy hồi đó ngu. Xuất ra file Word nặng (hơn 8000 trang), khó xem, khó tìm kiếm. Giờ muốn mở nó lên xem cũng phải yêu cầu một máy khá khá về cấu hình, chưa kể giờ có ý tưởng chuyển đổi sang app thì gặp vô vàn rắc rối.

=> Kinh nghiệm rút ra là:

Sau này có crawl data thì tách riêng ra từng file và nên lưu thẳng dạng HTML hay đưa vô Database rồi sau đó muốn chuyển sang dạng gì để chia sẻ thì chuyển! Nhớ nhé!!! @@

Để đưa được data vô DataBase, thì mình cần phải tách từng phác đồ trị bệnh ra riêng thành các record với tiêu đề, nội dung, lĩnh vực riêng cho từng phác đồ.

Vấn đề là file docx thì không thể đưa thẳng vô Database được (Mình dùng MySQL cho nhanh), chẳng lẽ copy/paste hết 2000 các phác đồ này vô =.=!

Dẹp cái ý tưởng này vì mình không rảnh và cũng không thích làm chuyện thủ công như vậy.

Vậy ý tưởng là mình sẽ convert cái file docx kia sang HTML rồi viết tool bóc tách data ra.

Docx2HTML

https://cloudconvert.com/docx-to-html

File lớn quá lỗi rồi @@

Thử với tool khác, không được nữa chắc kiếm thư viện rồi viết tool chạy chuyển đổi luôn quá =))

https://medium.com/@kelvintran/convert-word-to-html-using-python-with-python-mammoth-5590b96b434f

Cơ mà hên sau cái tool convert sau lại ok

https://document.online-convert.com/convert-to-html

Kết quả từ cái file Word ra được cái file HTML 26Mb :v

Bóc tách dữ liệu

Tạo Database

https://stackoverflow.com/questions/415801/allowed-memory-size-of-33554432-bytes-exhausted-tried-to-allocate-43148176-byte

Bóc tách dữ liệu

Viết hàm tách văn bản

Fix lỗi mệt nghỉ (nản nhất là mấy vụ xử lý chuỗi @@)

Cuối cùng gần thành công nhất là vẫn còn bị mấy điểm sau:

  • Dư khoảng trắng (Cố gắng khắc phục vì nó sẽ ảnh hưởng đến tìm kiếm)
  • Bị dư chữ h1 do thẻ h1 bị lỗi không mở mà có đóng
  • Vẫn còn chèn link đến bài viết (do hồi đó mình ghi ra cho người xem dễ tham khảo đến nguồn bài viết, cơ mà giờ web đó chết rồi

Do vậy cần 1 hàm replace khoảng trắng và bỏ chữ h1, link

Insert data to Database

Viết cái hàm đẩy dữ liệu vào Database

Test truy vấn thử phác nào =))

Xây dựng giao diện, chức năng tìm kiếm

Tada! Và đây là giao diện của ứng dụng, hơi cùi mía xíu nhưng chịu khó thì cũng search được :v

Chi tiết một phác đồ điều trị bệnh

Mục giới thiệu^^

“Đính kèm” lên Web

Tạo cho ứng dụng một cái sub domain xương xương nữa rồi… chia sẻ thôi :v

http://phacdo.dothanhlong.org/

Bổ sung thêm link để có thể bookmark hoặc chia sẻ phác đồ^^

5/2019
soiqualang_chentreu

]]>
Categories
Công nghệ Ý tưởng

Dọn nhà Viettabs :v

Tình hình là hồi trước mình có chia sẻ hơn 3000 tabs guitar pro sau khi dọn ổ cứng, và sẵn có cái ý tưởng build nó thành một web app để mọi người dễ dùng.

Lúc đó có đăng ký cái domain viettabs.cf cho ngắn gọn, dễ nhớ. Tuy nhiên sau này không nhớ ngày renew nó nên cứ bị xóa. Vì domain free nên nó không giữ domain cho mình :v

Do vậy mình quyết định chuyển về subdomain dothanhlong.org luôn cho dễ quản lý^^

Vậy là domain mới sẽ là

http://viettabs.dothanhlong.org/

Mọi người có thể truy cập trực tiếp từ menu app trên blog của mình

http://viettabs.dothanhlong.org/

Giao diện quen thuộc của em nó đây^^

Mình vẫn giữ cái domain name xưa làm kỷ niệm :v

http://viettabs.dothanhlong.org/

Một số post trước mình chia sẻ về cái ý tưởng chia sẻ guitar tabs này^^

https://dothanhlong.org/chia-se-download-hon-3000-tabs-guitar-pro/
https://dothanhlong.org/y-tuong-xay-dung-trang-chia-se-tab-so-guitar-pro/
https://dothanhlong.org/tutorial-part-1-xay-dung-viettabs-cf-dang-ky-hosting-domain-upload-data/
https://dothanhlong.org/tutorial-part-2-xay-dung-viettabs-cf-thu-vien-web-doc-file-guitar-tab-pro/
https://dothanhlong.org/tutorial-part-3-xay-dung-viettabs-cf-thiet-ke-import-du-lieu-vao-database/
]]>
Categories
GIS Sói's Tutorials Tài liệu Uncategorized WebGIS

Một số Demo WebGIS

soiqualang_chentreu

]]>
Categories
GIS Sói's Tutorials WebGIS

Install GeoServer 2.15 vs Tomcat 8

Install GeoServer 2.15 vs Tomcat 8

Cài tomcat và tomcat admin app

sudo apt install default-jdk

sudo apt-get install tomcat8

sudo apt-get install tomcat8-admin

Cấu hình Tomcat

Di chuyển đến thư mục config

cd /var/lib/tomcat8/conf/

vim tomcat-users.xml

Cài Geoserver app

Copy geoserver.war

/var/lib/tomcat8/webapps/

Restart Tomcat

sudo /etc/init.d/tomcat8 restart

Remove Tomcat

sudo apt remove –purge tomcat8 tomcat8-docs
sudo apt autoremove
sudo apt autoclean

Xem thêm

[Tutorial] Install Tomcat 7 and GeoServer on linux (kubuntu distro)

]]>
Categories
GIS

[Saved] How we did it: End-to-end deep learning in ArcGIS

How we did it: End-to-end deep learning in ArcGIS

Oil and gas is a huge industry in the United States, and is currently experiencing a boom in the Permian Basin. This oil-rich region stretches from western Texas to eastern New Mexico. Each day, hundreds of new well pads appear across the landscape, making it difficult for regulators to keep up with. But unregistered well pads are both a safety hazard and a missed opportunity for revenue for agencies such as the Bureau of Land Management.

At the plenary session of this year’s Esri Developer Summit, we demonstrated an end-to-end deep learning workflow to find unregistered well pads, using ArcGIS Notebooks. This can help regulators monitor the progress of new drilling on their land as well as look for potential illegal drilling.

Well Pads detected using deep learning. The ones highlighted in blue are not currently listed in the permits database.

The full workflow, from exporting training data and training a deep learning model to detecting objects across a large landscape, can be done using the ArcGIS API for Python. This blog article, originally written as an ArcGIS Notebook, shows how we did this with the help of the arcgis.learn module.

Geospatial deep learning

The field of artificial intelligence (AI) has progressed rapidly in recent years, matching or in some cases, even surpassing human accuracy. Broadly speaking, AI is the ability of computers to perform a task that typically requires some level of human intelligence. Machine learning is one type of engine that makes this possible, and uses data driven algorithms to learn from data to give you the answers that you need. One type of machine learning that has emerged recently is deep learning. Deep learning refers to deep neural networks, that are inspired from and loosely resemble the human brain.

The arcgis.learn module includes tools that support machine learning and deep learning workflows with geospatial data. This blog post focuses on deep learning with satellite imagery.

Applying Computer Vision to geospatial imagery

One area of AI where deep learning has done exceedingly well is computer vision, i.e. the ability for computers to ‘see’. This is particularly useful for GIS, as satellite, aerial and drone imagery is being produced at a rate that makes it impossible to analyse and derive insight from through traditional means. Object detection and pixel classification are among the most important computer vision tasks and are particularly useful for spatial analysis.

  • Object Detection involves finding objects within an image as well as their location in terms of bounding boxes. Finding what is in satellite, aerial or drone imagery, and where, and plotting it on a map can be used for infrastructure mapping, anomaly detection and feature extraction.
  • Pixel Classification, also referred to as image segmentation, involves classifying each pixel of an image as belonging to a particular class. In GIS, segmentation can be used for Land Cover Classification or for extracting roads or buildings from satellite imagery.

ArcGIS has tools to help with every step of the deep learning workflow including data preparation and exploratory data analysis, training deep learning models, deploying them for inferencing and finally disseminating results using web layers and maps and driving field activity.

ArcGIS Pro includes tools for labeling features and exporting training data for deep learning workflows and has being enhanced for deploying trained models for feature extraction or classification. ArcGIS Image Server in the ArcGIS Enterprise 10.7 release has similar capabilities and allow deploying deep learning models at scale by leveraging distributed computing. ArcGIS Notebooks provide one-click access to pre-configured Jupyter Notebooks along with the necessary deep learning libraries and a gallery of starter notebooks that show how deep learning models can be easily trained and deployed.

The arcgis.learn module

The arcgis.learn module in ArcGIS API for Python enable GIS analysts and data scientists to easily adopt and apply deep learning in their workflows. It enables training state-of-the-art deep learning models with a simple, intuitive API. By adopting the latest research in deep learning, it allows for much faster training and removes guesswork in the deep learning process. It integrates seamlessly with the ArcGIS platform by consuming the exported training samples directly, and the models that it creates can be used directly for inferencing (object detection and pixel classification) in ArcGIS Pro and Image Server.

This module includes methods and classes for:

  • Exporting Training Data
  • Data Preparation
  • Model Training
  • Model Management
  • Inference

Prerequisites

Data preparation, augmentation and model training workflows using arcgis.learn have a dependency on PyTorch and fast.ai deep learning libraries.

If you are using ArcGIS Notebook Server, the dependencies are already installed.

In the ArcGIS Pro 2.3 Python environment, the dependencies need to be installed using these commands:

conda install -c conda-forge spacy

conda install -c pytorch pytorch=1.0.0 torchvision

conda install -c fastai fastai=1.0.39

conda install -c arcgis arcgis=1.6.0 –no-pin

Otherwise, in a new conda environment, issue the following commands:

conda install -c fastai -c pytorch -c esri fastai=1.0.39 pytorch=1.0.0 torchvision arcgis=1.6.0

Object Detection with arcgis.learn

Deep learning models ‘learn’ by looking at several examples of imagery and the expected outputs. In the case of object detection, this requires imagery as well as known (or labelled) locations of objects that the model can learn from. With the ArcGIS platform, these datasets are represented as layers, and are available in our GIS.

In the workflow below, we will be training a model to identify well pads from Sentinel-2 imagery. Sentinel-2 is an Earth observation mission developed by ESA as part of the Copernicus Programme to perform terrestrial observations in support of services such as forest monitoring, land cover change detection, and natural disaster management.

In this analysis, data downloaded from https://earthexplorer.usgs.gov/ has been used for creating hosted image service in our GIS. The code below connects to our GIS and accesses the known well pad locations and the Sentinel imagery:

from arcgis.gis import GIS

from arcgis.raster.functions import apply

from arcgis.learn import export_training_data

gis = GIS(“home”)

# layers we need – The input to generate training samples and the imagery

well_pads = gis.content.get(‘ae6f1c62027c42b8a88c4cf5deb86bbf’) # Well pads layer

well_pads

# Sentinel-2 imagery published to portal

sentinel_item = gis.content.get(“15c1069f84eb40ff90940c0299f31abc”)

sentinel_item

Exporting Training Samples

The export_training_data() method generates training samples for training deep learning models, given the input imagery, along with labeled vector data or classified images. Deep learning training samples are small subimages, called image chips, and contain the feature or class of interest. This tool creates folders containing image chips for training the model, labels and metadata files and stores them in the raster store of your enterprise GIS. The image chips are often small, such as 256 pixel rows by 256 pixel columns, unless the training sample size is larger. These training samples support model training workflows using the arcgis.learn package as well as by third-party deep learning libraries, such as TensorFlow or PyTorch.

The object detection models in arcgis.learn accept training samples in the PASCAL_VOC_rectangles (Pattern Analysis, Statistical Modeling and Computational Learning, Visual Object Classes) format. The PASCAL VOC dataset is a standardized image dataset for object class recognition. The label files are XML files and contain information about image name, class value, and bounding boxes.

The models in arcgis.learn take advantage of pretrained models, that have been trained on large image collections, such as ImageNet, and fine tune them on satellite imagery. Pretrained models like these are excellent feature extractors and can be fine-tuned relatively easily on another task or different imagery without needing as much data. However, since the photographs that these models have been trained on contain only 3 channels (Red, Green Blue), we cannot take advantage of all the bands available in multispectral imagery, and need to pick 3.

The extract_bands() method can be used to specify which 3 bands should be extracted for fine tuning the models. In our analysis, we will be using the pre-configured ‘Natural Color with Dynamic Rage Adjustment(DRA)’ raster function:

sentinel_data = apply(sentinel_item.layers[0], ‘Natural Color with DRA’, astype=’U8′)

For better training, image chips should be exported with a larger size than that used for training the models. This allowsarcgis.learn to perform random center cropping as part of it’s default data augmentation and makes the model see a different sub-area of each chip when training leading to better generalization and avoid overfitting to the training data. By default, a chip size of 448 x 448 pixels works well, but this can be adjusted based on the amount of context you wish to provide to the model, as well as the amount of GPU memory available.

Here, we are exporting the training data for our model in the well_pads folder:

export_training_data(sentinel_data, well_pads, “PNG”,

                            {“x”:448,”y”:448}, {“x”:224,”y”:224},

                            “PASCAL_VOC_rectangles”, 75,

                            “well_pads”)

Data Preparation

Once the training samples have been exported, they need to be fed into the model for training. Data preparation can be a time consuming process that involves collating and massaging the training chips and labels into the specific format needed by each deep learning model.

Typical data processing piplelines involve splitting the data into training and validation sets, applying various data augmentation techniques, creating the necessary data structures for loading data into the model, setting the appropriate batch size and so on.arcgis.learn automates all these time consuming tasks and the prepare_data() method can directly read the training samples exported by ArcGIS. The prepare_data() method inspects the format of the training samples exported by export_training_data tool in ArcGIS Pro or Image Server (whether for object detection or pixel classification) and constructs the appropriate fast.aiDataBunch from it. This DataBunch consists of training and validation DataLoaders with the specified transformations for data augmentations, chip size, batch size, and split percentage for train-validation split.

By default, prepare_data uses a default set of transforms for data augmentation, that work well for satellite imagery. These transforms randomly rotate, scale and flip the images so the model sees a different image each time. This helps the model generalize better and not just ‘remember’ or overfit to the specific images in the training set. Alternatively, users can compose their own transforms usingfast.ai transforms for the specific data augmentations they wish to perform.

from arcgis.learn import prepare_data

data = prepare_data(‘/arcgis/directories/rasterstore/well_pads’,

                   {0: ‘   Pad’})

The show_batch() method can be used to visualize the exported training samples, along with labels, after data augmentation transformations have been applied.

data.show_batch()

Model Training

arcgis.learn includes support for training deep learning models for object detection. Support for training pixel classification model is coming in the next release.

The models in arcgis.learn are based upon pretrained Convolutional Neural Networks (CNNs, or in short, convnets) that have been trained on millions of common images such as those in the ImageNet dataset for image classification tasks. These CNNs (such as Resnet, VGG, Inception, etc.) can classify what’s in an image by basing their decision on features that they learn to identify in those images. In particular, they use a hierarchy of layers, with the earlier layers learning to identify simple features like edges and blobs, middle layers combining these primitive features to identify corners and object parts and the later layers combining the inputs from these in unique ways to grasp what the whole image is about (i.e. the semantic meaning). The final layer in a typical convnet is a ‘fully connected’ layer that looks at all the extracted semantic meaning in the form of feature maps across the whole image and essentially does a weighted sum of these to come up with a probability of each object class (whether its an image of a cat or a dog, or whatever).

A convnet trained on a huge corpus of images such as ImageNet is thus considered as a ready-to-use feature extractor. We could replace the last few layers of these convnets and substitute it with something else that uses those features for other useful tasks, such as object detection and pixel classification.

The arcgis.learn module is based on PyTorch and fast.ai and enables fine-tuning of pretrained torchvision models on satellite imagery. Pretrained models like these are excellent feature extractors and can be fine-tuned relatively easily on another task or different imagery without needing as much data. The arcgis.learn models leverages fast.ai’s learning rate finder and one-cycle learning, and allows for much faster training and removes guesswork in the deep learning process.

arcgis.learn includes the SingleShotDetector model (based on Fast.ai MOOC Version2 Lesson 9) for object detection tasks. A pretrained convnet, like ResNet, acts as the ‘backbone’ upon which the SingleShotDetectormodel is based, or as the ‘encoder’ part of the upcomingUnetClassifier.

Object Detection using SingleShotDetector

Once we have a good image classifier, a simple way to detect objects is to slide a ‘window’ across the image and classify whether the image in that window (cropped out region of the image) is of the desired type. However, this is terribly inefficient as we need to look for objects everywhere in the image, and at different scales, as the objects might be larger or smaller. This requires multiple passes of regions of the image through the image classifier which is computationally infeasible. Another class of object detection networks (like R-CNN and Fast(er) R-CNN) use a two stage approach — first to identify regions where objects are expected to be found and then running those region proposals through the convnet for classifying and creating bounding boxes around them.

The latest generation of object detection networks such as YOLO (You Only Look Once) and SSD (Single-Shot Detector) use a fully convolutional approach in which the network is able to find all objects within an image in one pass (hence ‘single-shot’ or ‘look once’) through the convnet.

“SSD: Single Shot MultiBox Detector”, 2015; arXiv:1512.02325.

Instead of using a region proposal networks to come up with candidate locations of prospective objects, the Single Shot MultiBox Detector (on which the SingleShotDetector is modeled) divides up the image using a grid with each grid cell responsible for predicting which object (if any) lies in it and where.

Backbone SSD uses a pre-trained image classification network as a feature extractor. This is typically a network like ResNet trained on ImageNet, from which the final fully connected layers to come up with the predicted class of an input image have been removed. We are thus left with a deep neural network that is able to extract semantic meaning from the input image while preserving the spatial structure of the image albeit at a lower resolution. For ResNet34 the backbone results in a 256 7×7 ‘feature maps’ of activations for each input image. Each of these 256 feature maps can be interpreted as a grid of 7×7 activations that fire up when a particular feature is detected in the image. In the SSD architecture, one or more convolutional layers are added to this backbone and the outputs are interpreted as the bounding boxes and classes of objects in the spatial location of the final layer’s activations.

Receptive Field Convolutional neural networks preserve the spatial structure of an image because of the way the convolution operation is applied. A learnable filter slides over the image from left to right and top to bottom and the activations represent how similar that part of the image is to the filter. Each activation in the output feature map is thus ‘looking at’ that region of the previous feature map (and ultimately the image because a deep CNN has multiple such convolutional layers). The part of the image that is ultimately responsible for an activation in a feature map is referred to as the ‘receptive field’ of that activation. Each activation in the output feature map has ‘seen’ that part of the image more than any other activation and is it natural to expect that activation to contain the most information needed to detect objects in its receptive field. This is the central premise of the SSD architecture.

As it’s possible for multiple objects to occupy a grid cell, and for the objects to have a different sizes or aspect ratios, each grid cell has several assigned anchor boxes (also known as prior boxes) — one for each possible object size and aspect ratio within that grid cell. SSD uses a matching phase while training, to match the appropriate anchor box with the bounding boxes of each ground truth object within an image. Essentially, the anchor box with the highest degree of overlap with an object is responsible for predicting that object’s class and its location. This property is used for training the network and for predicting the detected objects and their locations once the network has been trained.

Having a knowledge of the SingleShotDetector architecture and how the anchor boxes are specified using grid cells, aspect ratios and zoom levels allows one to design a suitable model for the object detection task at hand. If the objects you are detecting are all of roughly the same size, you can simplify the network architecture by using just one scale of the anchor boxes. A simpler network is easier to train. More powerful networks can detect multiple overlapping objects of varying sizes and aspect ratios, but need more data and computation for training.

Grid cells A simple way to detect multiple objects in an image is to divide the image using a grid and have each grid cell be responsible for detecting objects in that region of the image. Detecting objects simply means predicting the class(type) and location of an object within that region. If no object is present, we consider it as the background class and the location is ignored.

In the SSD architecture, we add additional convolutional layers to the backbone network and architect the additional layers in such a manner that the spatial size of the final layer is the same as the size of the grid we are using. The depth of the final feature map is used to predict the class of the object within the grid cell and it’s bounding box. This allows SSD to be a fully convolutional network that is fast and efficient, while taking advantage of the receptive field of each grid cell to detect objects within that grid cell.

For instance, we could use a 4×4 grid to detect objects in an image, when we see that their size is such that approximately 16 of them could occupy an image chip (4 on a side).

Such an SSD architecture can be created using:

ssd = SingleShotDetector(data, grids=[4], zooms=[1.0], ratios=[[1.0, 1.0]])

The grids parameter specifies the size of the grid cell, in this case 4×4. Additionally, we are specifying a zoom level of 1.0 and aspect ratio of 1.0:1.0. What this essentially means is that the network will create an anchor box (or prior box, as its known in other places) for each grid cell, which is the same size as the grid cell (zoom level of 1.0) and is square in shape with an aspect ratio of 1.0:1.0 The output activations along the depth of the final feature map are used to shift and scale this anchor box (within a reasonable limit) so it can approach the actual bounding box of the object even if it doesn’t exactly line up with the anchor box.

We might be interested in several layers or hierarchies of grid cells. For example, we could use a 4×4 grid to find smaller objects, a 2×2 grid to find mid sized objects and a 1×1 grid to find objects that cover the entire image. That can be done by specifying [4, 2, 1] as the grids parameter.

Zoom levels/scales

Cars and Pools have different scales

It is not necessary for the anchor boxes to have the same size as the grid cell. We might be interested in finding smaller or larger objects within a grid cell. The zooms parameter is used to specify how much the anchor boxes need to be scaled up or down with respect to each grid cell.

Aspect ratios

Not all objects are square in shape. Some are longer and some are wider, by varying degrees. The SSD architecture allows pre-defined aspect ratios of the anchor boxes to account for this. The ratios parameter can be used to specify the different aspect ratios of the anchor boxes associates with each grid cell at each zoom/scale level.

Having multiple anchor boxes per grid cell with different aspect ratios and at different scales, while also allowing for multiple hierarchies of grid cells results in a profusion of potential anchor boxes that are candidates for matching the ground truth while training, and for prediction.

Creating SingleShotDetector Model

Since the image chips visualized in the section above indicate that most well pads are roughly of the same size and square in shape, we can keep an aspect ratio of 1:1 and zoom (scale) of 1. This will help simplify the model and make it easier to train. Also, since the size of well pads in the image chips is such that approximately nine could fit side by side, we can keep a grid size of 9.

We then create a Single Shot Detector with a specified grid size, zoom scale and aspect ratio:

from arcgis.learn import SingleShotDetector

ssd = SingleShotDetector(data, grids=[9], zooms=[1.0], ratios=[[1.0, 1.0]])

Finding the optimum learning rate

Once the appropriate model has been constructed, it needs to be trained over several epochs, or training passes over the training data. This process involves setting the optimum learning rate. Picking a very small learning rate leads to slow training of the model, while picking one that it too high can prevent the model from converging and ‘overshoot’ the minima, where the loss (or error rate) is lowest. arcgis.learn includes fast.ai’s learning rate finder, accessible through the model’s lr_find() method, that helps in picking the optimum learning rate, without needing to experiment with several learning rates and picking from among them.

ssd.lr_find()

The learning rate is specified using two numbers – a lower rate for fine tuning the earlier layers of the pretrained backbone, and the higher rate for training the newly added layers for the task at hand. The higher learning rate can be deduced by inspecting the learning rate graph and picking the highest learning rate (on the x axis) where the loss is still going down (while still being lower than the point from where it shoots up). The lower learning rate is usually a fraction (one tenth works well) of the higher rate but can be adjusted depending upon how different the imagery is from natural images on which the backbone network is trained.

In the chart above we find that the loss is going down steeply at 2e-02 (0.02) and we pick that as the higher learning rate. The lower learning rate is approximately one tenth of that. We choose 0.001 to be more careful not to disturb the weights of the pretrained backbone by too much. This is why we are picking a learning rate of slice(0.001, 0.02) to train the model in the next section.

Training the model

Training the model is an iterative process. We can train the model using its fit() method till the validation loss (or error rate) continues to go down with each epoch (or training pass over the data). This is indicative of the model learning the task.

ssd.fit(10, slice(0.001, 0.02))

As each epoch progresses, the loss (error rate, that we are trying to minimize) for the training data and the validation set are reported. In the table above we can see the losses going down for both the training and validation datasets, indicating that the model is learning to recognize the well pads. We continue training the model for several iterations like this till we observe the validation loss starting to go up. That indicates that the model is starting to overfit to the training data, and is not generalizing well enough for the validation data. When that happens, we can try reducing the learning rate, adding more data (or data augmentations), increase regularization by increasing the dropoutparameter in the SingleShotDetector model, or reduce the model complexity.

Unfreezing the backbone and fine-tuning

By default, the earlier layers of the model (i.e. the backbone or encoder) are frozen and their weights are not updated when the model is being trained. This allows the model to take advantage of the (ImageNet) pretrained weights for the backbone, and only the ‘head’ of the network is trained initially. Once the later layers have been sufficiently trained, it helps to improve model performance and accuracy to unfreeze() the earlier layers and allow their weights to be fine-tuned to the nuances of the particular satellite imagery compared to the photos of everyday objects (from ImageNet) that the backbone was trained on. The learning rate finder can be used to identify the optimum learning rate between the different training phases .

Visualizing results

The results of how well the model has learnt can be visually observed using the model’s show_results() method. The ground truth is shown in the left column and the corresponding predictions from the model on the right. As we can see below, the model has learnt to detect well pads fairly well. In some cases, it is even able to detect the well pads that are missing in the ground truth data (due to inaccuracies in labeling or the records).

ssd.show_results(rows=25, thresh=0.05)

Saving trained model

Once you are satisfied with the model, you can save it using the save()method. This creates an Esri Model Definition (EMD file) that can be used for inferencing in ArcGIS Pro as well as a Deep Learning Package (DLPK zip) that can be deployed to ArcGIS Enterprise for distributed inferencing across a large geographical area. Saved models can also be loaded back using the load()method, for futher fine tuning.

ssd.save(‘WellPadDetector’)

> Created model files at /arcgis/directories/rasterstore/well_pads/models/WellPadDetector

Deploying model

Once a model has been trained, it can be added to ArcGIS Enterprise as a deep learning package.

trained_model = ‘/arcgis/directories/rasterstore/well_pads/models/WellPadDetector/WellPadDetector.zip’

model_package = gis.content.add(item_properties={

           “type”:”Deep Learning Package”,

           “typeKeywords”:”Deep Learning, Raster”,

           “title”:”Well Pad Detection Model”,

           “tags”:”deeplearning”,

           “overwrite”:’True’}, data=trained_model)

model_package

Model lifecycle management

The arcgis.learn module includes the install_model() method to install the uploaded model package (*.dlpk) to the raster analytics server.

Optionally after inferencing the necessary information from the imagery using the model, the model can be uninstalled usinguninstall_model(). The deployed models on an Image Server can be queried using the list_models()method.

The uploaded model package is installed automatically on first use as well. We can query the settings of the deep learning model using the query_info().

from arcgis.learn import Model

detect_objects_model = Model(model_package)

detect_objects_model.install()

Detecting Objects

The detect_objects() function can be used to generate feature layers that contains bounding box around the detected objects in the imagery data using the specified deep learning model.

Note that the deep learning library dependencies needs to be installed separately, in addition on the image server.

For arcgis.learn models, the following sequence of commands in ArcGIS Image Server’s Pro Python environment install the necessary dependencies:

conda install -c conda-forge spacy

conda install -c pytorch pytorch=1.0.0 torchvision

conda install -c fastai fastai=1.0.39

conda install -c arcgis arcgis=1.6.0 –no-pin

We specify the geographical extent and imagery cell size for feature extraction, and whether to use the GPU or CPU in the contextparameter. Each detection has an associated score, that indicates how confident the model is about that prediction. We can set a score threshold to filter out false detections. In this case, we found that we can lower the score threshold to 0.05 and catch more detections without having too many false detections. A non max suppression(nms_overlap) parameter can be specified to weed out duplicate overlapping detections of the same object.

context = {‘cellSize’: 10,

          ‘processorType’:’GPU’,

          ‘extent’:{‘xmin’: -11587791.393960,

                    ‘ymin’: 3767970.198031,

                    ‘xmax’: -11454320.817016,

                    ‘ymax’: 3875304.476397, ‘spatialReference’: {‘latestWkid’: 3857, ‘wkid’: 102100}}}

params = {‘padding’:’0′, ‘threshold’:’0.05′, ‘nms_overlap’:’0.1′, ‘batch_size’:’64’}

Finally, the code below shows how we can use distributed raster analytics to automate object detection across a large geographical area and create a feature layer of well pad detections.

from arcgis.learn import detect_objects

detected_pads = detect_objects(input_raster=sentinel_data,

                             model=detect_objects_model,

                             model_arguments=params,

                             output_name=”Well_Pads_Detect_full3″,

                             context=context,

                             gis=gis)

detected_pads

Visualizing detection layer

We can visualize the results using the map widget, right within the notebook.

web_map = gis.content.search(“title: Well Pad Detection AND owner:portaladmin”,item_type=”Web Map”)[0]

map_widget = gis.map(web_map)

map_widget.extent = {‘spatialReference’: {‘latestWkid’: 3857, ‘wkid’: 102100},

‘xmin’: -11397184.938845266,

‘ymin’: 3761693.7641860787,

‘xmax’: -11388891.521276105,

‘ymax’: 3764082.4213200537}

map_widget.zoom = 15

map_widget

We could take these results, share them as maps and layers, do further analysis to find which well pads are missing in the database, where the hotspots of new drilling activity are, and how they are changing over time. With Workforce for ArcGIS, we can create assignments for mobile workers, such as inspectors and drive field activity. In conclusion, ArcGIS has end-to-end support for deep learning — from hosting the data, to exporting training samples and training a deep learning model, to detecting objects across a large region and driving field activity.

https://medium.com/geoai/how-we-did-it-end-to-end-deep-learning-in-arcgis-dd5b10d87b8
]]>
Categories
GIS

[Saved] Parking Lot Vehicle Detection Using Deep Learning

The Era of Drones

Although the idea of vehicle detection is not a groundbreaking one and has been around since the emergence of video cameras and embedded sensors, these methods were often marred by high capital and maintenance costs and a high complexity from having to integrate multiple data sources, each with a limited band of inputs. The prevalence of drones in the commercial market in recent years on the other hand, has brought about a new era of state-of-the-art aerial photogrammetry and a drastic reduction in the cost of obtaining aerial data. With this sudden increase in information, and by combining machine learning with GIS technologies, we are now capable of performing new and insightful analyses on issues of interest.

Existent business problems which stand to benefit from this include customer flow analyses and demographic modelling. This is particularly useful for those in the retail sector looking to monitor peak business hours by counting the number of parked vehicles at a given time and also extrapolate useful customer information (such as income, marital status, household size and even political inclination) by classifying the types of vehicles they own.

So, can we solve these problems using AI and GIS? The answer is yes, and a good starting point would be to come up with a workflow that tallies the number of cars per unit time as well as infer the vehicle type for every positive detection. In this article, I aim to give a comprehensive overview of a such a workflow — from data acquisition and processing using Drone2Map to performing data inferencing using TensorFlow and ArcGIS Pro, and finally to creating actionable BI visualizations using The Operations Dashboard in ArcGIS Online (AGOL).

Complete vehicle detection/classification workflow, all training was carried out on the GeoAI Data Science Virtual Machine (DSVM) on Azure, which offers a ready-to-use environment for training machine learning models along with ArcGIS Pro pre-installed.

Data Collection & Exploratory Analysis using Drone2Map

To obtain some sample data, we flew a drone over a busy parking lot here at our office in Redlands, California and obtained a series of geo-tagged tiff files with geolocation corroborated by 5 ground control points (GCPs) to ensure the result would be accurate enough to identify the correct parking space for each vehicle. These images were captured along a “lawn mower” flight path with an overlap of 70% along flight lines and at least 60% between flight lines. The reason we do this is to facilitate the generation of true orthomosaics. Traditional orthos (or “frame” orthos) suffer from what is known as the “layover effect”, where tall structures such as trees or buildings seemingly “lean” toward or away from the observer as a consequence of stitching together disparate frames that do not capture objects at a true nadir perspective. This effect worsens for objects that are at the edges of a drone’s field of view.

Using Drone2Map, we can take these individual frames and resolve for an object’s true ortho by finding common views between a frame and its 8 adjacent frames in a point cloud and then keeping the views that have a high degree of overlap. The resultant orthomosaic is not only a true ortho, but one that does not reveal seamlines between images typical of frame orthos.

Of course, all of this is automated, and the actual image processing step is simple: Create a 2D mapping project in Drone2Map, pull in your data, add ground control points as needed and then hit start to generate a true orthomosaic.

2D orthomosaic visualized in Drone2Map

From this, we obtain 3 classes of output products: A 2D orthomosaic of our parking lot, a digital surface model (DSM) layer and a digital terrain model (DTM) layer. An initial thought was to simply pass the DSM to a detection network to produce bounding boxes on distinctly “car-like” protrusions. However, on closer inspection of the dataset we identified some potential issues with this approach: in this particular parking lot, the coverage of foliage was so extensive as to affect the detection of certain cars partially or completely hidden by overhanging branches and leaves.

DSM view with partially obstructed parking spaces

The overhanging vegetation affected both the DSM and orthomosaic, but since the edge of each image includes oblique view angles at the image edges, some images were able to view partially or completely underneath the tree canopy. ArcGIS also enables each image from the drone to be orthorectified. Following photogrammetric processing in Drone2Map, each image could be analyzed in its proper geospatial placement, providing multiples views of each parking space.

Orthomosaic that offers a better view of partially covered vehicles

Processing of these oblique views was beyond the scope of this initial project, but will be the subject of future testing. In addition, parts of the canopy depicted by the orthomosaic are not fully opaque, and the RGB bands may also provide additional channels that would allow correct subdivision of vehicles into categories such as trucks, sedans and SUVs. From a data collection standpoint, it is also much easier to collect pure ortho imagery than outfitting drones with LIDAR sensors for DSM/DTM detection.

Simple Approach: Building a classification model using InceptionV3

Our first solution for tackling this problem was an obvious one: Simply overlay a polygon layer from a mapped parking lot on top of the orthomosaic raster and clip out cars using the Split Raster geoprocessing tool to get our prediction set. This was very easily done.

Then comes the question of which classification model to apply atop which finetuning set. A simple off-the-shelf model that’s available from both TensorFlow Slim and TensorFlow Hub is InceptionV3. Based off the original InceptionNet (Szegedy et al.), this third revision bears much resemblance in terms of core structure to the original with similar component modules. However, it has the addition of factorization methods to reduce the representational bottleneck as well as label smoothing and batch norm operations on the auxiliary classifiers to increase regularization.

As with most TensorFlow Hub models, there is no need to train from scratch when we can apply transfer learning; Luckily, InceptionV3 was pretrained on ImageNet.

As for the finetuning set, The Cars Overhead with Context (COWC) dataset from LLNL was an easy pick for its richly annotated set of 32,716 vehicles as well as hard negative examples (boats, commercial vehicles etc.). Although the dataset doesn’t plug straight into the classification network, the only preprocessing work here involves reading through the list of the annotated text files and cropping the associated jpegs with either OpenCV or ImageMagick. (N.B. the COWC dataset has 4 class labels: Sedan, Pickup, Others and Unknown. With an additional background class that makes 5 classes. I had to play around with class balancing to ensure each class was sufficiently represented, and also mine some samples for background classes which weren’t provided in COWC).

The COWC dataset for the pickup class and sedan class

These images have been extracted from the original COWC format which is comprised of large 22000×22000 images with bounding box information contained in ancillary .txt files. One thing to note is that the original bounding box coordinates tended to crop off large sections of the vehicles in question, which ultimately lead to worse overall performance in identifying the pickup class. Therefore these images have all been cropped using larger 34×34 bounding boxes.

Classification loss after 30k iterations, note the large variation in performance within a mini-batch. The orange line represents training accuracy and the blue line represents validation accuracy.

By looking at the COWC dataset we can immediately tell that the resolution of the dataset is not ideal. Early training results have shown that the classification network performed exceedingly well in binary classification tasks of identifying occupied/unoccupied spaces, but performed worse in distinguishing between vehicle classes. Part of the reason for this is also due to the fact that the pickup class was severely underrepresented. Testing proved that undersampling the sedan/other/background classes yielded the best validation accuracy (as opposed to oversampling the pickup class).

Once the model was sufficiently trained (on a good GPU this takes a couple hours — for me it was a Tesla K80 on the GeoAI VM for about 1.5 hours at a training accuracy of 0.85 and a validation accuracy of 0.84), we can proceed to apply our previously extracted prediction set to the model.

Output these results into a .csv file (it might be useful to apply some smart naming conventions here to ensure your data items match the OBJECTIDs of each polygon in the parking space feature layer. From here, simply combine the two layers using Add Join and voilà, you have a polygon layer that associates a class probability with each parking space based on an aerial image you took.

Thus far, we have only created some rich geotagged layers that are not yet informative nor intuitive enough give any kind of analytical insight. This is where ArcGIS Online offers us a path forward: we export both the orthomosaic as well as the feature layer to our ArcGIS Online Portal, then optionally in the MapViewer, modify the symbology of our polygon layer to be attribute-driven.

Sharing a feature layer as a web layer
Editing the feature class in MapViewer allows you to modify the default symbology. In my case, blue = sedan, red = pickup trucks, while transparency is determined by classification confidence.

You can then import individual maps and visualize the results interactively on the Operations Dashboard:

The Ops Dashboard reveals insight at a glance

We scheduled two drone flights over the same parking lot at different times of the day. To complement this data, we have also generated some mock input to illustrate occupancy info on an hourly basis that simulates customer flow within a typical work day. The Hour Selector cycles through parking patterns over every time segment. To suit your use case, you may also decide to deploy a drone every few weeks or every few months. The Average Occupancy Heatmap is a visual representation of “hot spots” wherein vehicles aggregate. If you like it is also possible to generate a “Turnover Heatmap” that maps regions in which vehicles are likely to park for longer/shorter periods. Both of these views are potentially useful for understanding demographic behavior when crossed-referenced with vehicle types or to detect favorite stores & customer stay-times at shopping venues.

The two useful vehicle categories from COWC: sedans and pickups, are shown in the Vehicle Types visual element, each color coded and also with a transparency level linked to its classification confidence. Finally, pure vehicle counting/detection is presented in the Current Vehicle Classification Analysis view with a gauge to show the current occupancy.

Shortcomings of a classification-only model

Our blind assumption for a classification-only approach is that all cars fit neatly inside each parking space polygon (failing to take into account bad drivers, double-parkers or your regular F150s so easily cropped off by the sensibly-sized parking spaces). Of course, there are other use cases for vehicle detection for which a predefined polygon layer is simply impossible to draw (think roadside parking or parking lots for which there are no guidelines). These coupled with the fact that a simple classification network is simply not “smart enough” prompted us to think of another approach to this problem.


Better Approach: Building a detection model using Faster-RCNN

The fundamental idea behind a Faster-RCNN network is that it does two things at once: It detects the bounding boxes of objects of interest using a Region Proposal Network (RPN), and performs classification on those detections using a base classifier after region of interest pooling (ROI).

Faster R-CNN architecture, click here for a more a more comprehensive article

Like our previous attempt, we used COWC for fine-tuning. This time, there is no need to extract individual vehicles for training. To visualize what the dataset looks like, you can test out the following snippet by replacing the (x,y), width and height values in the patches.Rectangle() method with whatever value is shown in an image’s corresponding .txt file.

Snippet to visualize COWC bounding box annotations
Output of a COWC patch

I also had to manually convert these images into Pascal VOC format to be consumed by the Faster R-CNN model. Unlike the classification model, training this model took roughly 24 hours on the K80. This is because in Faster R-CNN, anchor generation within the RPN forms a bottleneck (potentially generating up to 6000 region proposals per image). Other detection models such as SSD or YOLO (at least the first generation) ameliorate the speed issue at the cost of lower mAP scores. However, seeing as we are not concerned with near real-time detection, it suffices to use Faster R-CNN to produce a good output at reasonable speeds (~7 seconds or so on the CPU for each image tile of size of 600×600 which is rescaled from the original 5000×5000 px). Again, Faster R-CNN is provided in the TensorFlow models library. There you can also find some default configuration files for the base classifier you have chosen (ResNet101 pretrained on the MS COCO dataset in my case). There are in fact a whole slew of pretrained base models from TensorFlow’s detection model zoo you can choose for yourself, each with their own speed/mAP trade-off.

This is part of the configuration file I used, note the height and width values that match the COWC “patch_size”, or size of each input image tile. This can be configured in the CreateDetectionScenes.py script from COWC’s Github page. Additionally you can choose to base your classification model on something other than ResNet101. Depending on the size of your detection it may also be prudent to adjust the scales of your anchor for faster convergence, although I find it much more effective to re-tile your inputs if each object ends up comprising less than 4% of your input size.

Faster R-CNN loss, note the fast converging RPN loss meaning it is much easier for the network to place a bounding box than it is for it to determine the class of vehicle in an existing bounding box.

You can of course write your own evaluation script to visualize the trained model. There is in fact a very good template on TensorFlow’s Github page. I made some modifications to the following snippet to also allow you to adjust the detection threshold and the number of boxes to draw which I find very useful in visually understanding the performance of your model early on in the training process:

Preliminary evaluation result which shows an abundance of the “Other” category. You can reduce this with a simple thresholding function either in script or in ArcGIS Pro.

Alternatively, if you wish for a more straightforward approach to performing inferencing, the upcoming ArcGIS Pro 2.3 offers a convenient geoprocessing tool “Detect Objects Using Deep Learning” to perform evaluation on any input raster by passing it a trained model in the form of a frozen inference graph protobuf defined inside a model description JSON file. As well as several evaluation hyperparameters such as padding and detection threshold. Hit run and you get a new feature layer of bounding boxes in return.

Applying the Detect Objects Using Deep Learning geoprocessing tool for inferencing

Similar to the classification approach, we can visualize these layers much more effectively by exporting them to ArcGIS Online and viewing them in the Operations Dashboard.

Vehicle detection dashboard, note the successful detection of vehicles that are not parked within designated parking spaces. The morning/afternoon detections are also shown to reveal changes in parking behavior.

All the views from the classification dashboard can also be represented here with the same analyses drawn, but now vehicles parked outside of predesignated spaces (including vehicles in motion) are all detected using Faster R-CNN.

Closing thoughts and future work

This article serves as an exploratory glimpse into machine learning-driven GIS for vehicle detection, and of how important business decisions can be informed from start to finish by leveraging the powerful Esri ecosystem to create a complete BI workflow.

The models described above can of course be taken a couple steps further, by training on an input set with richer annotations or cross-referencing the COWC dataset with datasets that tie together vehicle types with income and other demographic info to produce even more powerful analyses, reveal subtler patterns and yield finer customer segmentations. Likewise, these models apply to aerial detection of all kinds (crops, utility poles, animals, forest fires, ships), as long as you have access to a decent GPU and a fine-tuning dataset.

Another potential expansion to this pipeline would be to introduce oblique imagery into the mix in a manner similar to what was done in Esri’s Oblique Viewer App that allows for multiple oblique views to be attached to the same ortho frame. For vehicle detection this effectively increases the number of images from which we can extract high fidelity data and also gives an unobstructed view of cars hidden under tree canopies.

Hopefully this has been an interesting read, please give us a clap and share this post if you enjoyed it, and let us know in the comments what other insights can be drawn from these data and whether you think there’s a better approach to be considered!

This effort was done as part of the Esri GeoAI team. For other cool GISxML projects, check out the GeoAI Medium page here. Feel free to contact Omar Maherfor internship or full time opportunities! ([email protected])

]]>