-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
1714 lines (1557 loc) · 104 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<!-- Meta tags for social media banners, these should be filled in appropriatly as they are your "business card" -->
<!-- Replace the content tag with appropriate information -->
<meta name="description" content="DESCRIPTION META TAG">
<meta property="og:title" content="DEEP-EM TOOLBOX" />
<meta property="og:description"
content="Unlock the power of Deep Learning in Electron Microscopy with the DEEP-EM TOOLBOX standardized workflows for EM image analysis." />
<meta property="og:url" content="URL OF THE WEBSITE" />
<!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X630-->
<meta property="og:image" content="static/image/your_banner_image.png" />
<meta property="og:image:width" content="1200" />
<meta property="og:image:height" content="630" />
<meta name="twitter:title" content="DEEP-EM TOOLBOX">
<meta name="twitter:description"
content="Unlock the power of Deep Learning in Electron Microscopy with the DEEP-EM TOOLBOX standardized workflows for EM image analysis.">
<!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X600-->
<meta name="twitter:image" content="static/images/your_twitter_banner_image.png">
<meta name="twitter:card" content="summary_large_image">
<!-- Keywords for your paper to be indexed by-->
<meta name="keywords" content="Deep Learning, Electron Microscopy, Data Analysis, Data Interpretation, Toolbox">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>DEEP-EM TOOLBOX</title>
<link rel="icon" type="image/x-icon" href="static/images/icon.png">
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
<link rel="stylesheet" href="static/css/bulma.min.css">
<link rel="stylesheet" href="static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="static/css/bulma-slider.min.css">
<link rel="stylesheet" href="static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="static/css/index.css">
<style>
.hidden {
display: none;
}
button.round-button {
background-color: white;
border: none;
border-radius: 50%;
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
color: black;
padding: 13px 13px;
font-size: 12px;
cursor: pointer;
transition: box-shadow 0.2s ease;
width: 45px;
height: 45px;
}
button.round-button:hover {
box-shadow: 0 6px 8px rgba(0, 0, 0, 0.15);
}
button.round-button:active {
box-shadow: 0 3px 5px rgba(0, 0, 0, 0.2);
}
.button-55 {
align-self: center;
background-color: #fff;
background-image: none;
background-position: 0 90%;
background-repeat: repeat no-repeat;
background-size: 4px 3px;
border-radius: 15px 225px 255px 15px 15px 255px 225px 15px;
border-style: solid;
border-width: 2px;
box-shadow: rgba(0, 0, 0, .2) 15px 28px 25px -18px;
box-sizing: border-box;
color: #41403e;
cursor: pointer;
display: inline-block;
font-family: Neucha, sans-serif;
font-size: 1rem;
line-height: 23px;
outline: none;
padding: .75rem;
text-decoration: none;
transition: all 235ms ease-in-out;
border-bottom-left-radius: 15px 255px;
border-bottom-right-radius: 225px 15px;
border-top-left-radius: 255px 15px;
border-top-right-radius: 15px 225px;
user-select: none;
-webkit-user-select: none;
touch-action: manipulation;
}
.button-55:hover {
box-shadow: rgba(0, 0, 0, .3) 2px 8px 8px -5px;
transform: translate3d(0, 2px, 0);
}
.button-55:focus {
box-shadow: rgba(0, 0, 0, .3) 2px 8px 4px -6px;
}
.gray-background {
background-color: rgb(228, 228, 228);
/* Green background */
padding: 20px;
/* Padding inside the element */
border-radius: 10px;
/* Rounded corners */
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
/* Subtle shadow for depth */
font-size: 16px;
/* Font size */
margin: 20px 0;
/* Margin outside the element */
}
.green-background {
background-color: #dde9afff;
/* Green background */
padding: 20px;
/* Padding inside the element */
border-radius: 10px;
/* Rounded corners */
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
/* Subtle shadow for depth */
font-size: 16px;
/* Font size */
margin: 20px 0;
/* Margin outside the element */
}
.red-background {
background-color: #ffaaaaff;
/* Green background */
padding: 20px;
/* Padding inside the element */
border-radius: 10px;
/* Rounded corners */
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
/* Subtle shadow for depth */
font-size: 16px;
/* Font size */
margin: 20px 0;
/* Margin outside the element */
}
.orange-background {
background-color: #ffb380ff;
/* Green background */
padding: 20px;
/* Padding inside the element */
border-radius: 10px;
/* Rounded corners */
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.2);
/* Subtle shadow for depth */
font-size: 16px;
/* Font size */
margin: 20px 0;
/* Margin outside the element */
}
</style>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://documentcloud.adobe.com/view-sdk/main.js"></script>
<script defer src="static/js/fontawesome.all.min.js"></script>
<script src="static/js/bulma-carousel.min.js"></script>
<script src="static/js/bulma-slider.min.js"></script>
<script src="static/js/index.js"></script>
<script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<script>
function toggleVisibility(id_content, id_button) {
var content = document.getElementById(id_content);
var btn = document.getElementById(id_button);
if (content.classList.contains('hidden')) {
content.classList.remove('hidden');
btn.innerHTML = "Show Less"
} else {
content.classList.add('hidden');
btn.innerHTML = "Show More"
}
}
function toggleVisibility_triangle(id_content, id_button) {
var content = document.getElementById(id_content);
var btn = document.getElementById(id_button);
if (content.classList.contains('hidden')) {
content.classList.remove('hidden');
btn.innerHTML = "▼"
} else {
content.classList.add('hidden');
btn.innerHTML = "▶"
}
}
</script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<img src="static/images/icon.png"
alt="Schematic showing 3 differnt types of task applicable for deep learning. (image to values, image to image & 2D to 3D)" />
<h1 class="title is-1 publication-title">DEEP-EM TOOLBOX:</h1>
<h2 class="title is-1 publication-title">Deep Learning Toolbox for Electron Microscopy Researchers</h2>
<div class="is-size-5 publication-authors">
<!-- Paper authors -->
<span class="author-block">
<a href="https://viscom.uni-ulm.de/members/hannah-kniesel/" target="_blank">Hannah
Kniesel</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://viscom.uni-ulm.de/members/tristan-payer/" target="_blank">Tristan
Payer</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://viscom.uni-ulm.de/members/poonam/" target="_blank">Poonam Poonam</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="" target="_blank">Tim Bergner</a><sup>2</sup>,
</span>
<span class="author-block">
<a href="https://phermosilla.github.io/" target="_blank">Pedro Hermosilla</a><sup>3</sup>
</span>
<span class="author-block">
<a href="https://viscom.uni-ulm.de/members/timo-ropinski/" target="_blank">Timo
Ropinski</a><sup>1</sup>,
</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>Visual Computing Group, Ulm University<br><sup>2</sup>Central
Facility for Electron Microscopy, Ulm University<br><sup>3</sup>Computer Vision Lab, TU Vienna</span>
<!-- <span class="eql-cntrb"><small><br><sup>*</sup>Indicates Equal Contribution</small></span> -->
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- Arxiv PDF link -->
<span class="link-block">
<a href="https://arxiv.org/pdf/<ARXIV PAPER ID>.pdf" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<!-- Supplementary PDF link
<span class="link-block">
<a href="static/pdfs/supplementary_material.pdf" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Supplementary</span>
</a>
</span>-->
<!-- Github link
<span class="link-block">
<a href="https://github.com/YOUR REPO HERE" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code</span>
</a>
</span> -->
<!-- ArXiv abstract Link
<span class="link-block">
<a href="https://arxiv.org/abs/<ARXIV PAPER ID>" target="_blank"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span>-->
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<!-- Teaser image
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<img src="static/images/tasks.png" alt="Schematic showing 3 differnt types of task applicable for deep learning. (image to values, image to image & 2D to 3D)" />
<h2 class="subtitle has-text-centered">We propose to categorize tasks within the area of EM data analysis into Image to Value(s), Image to Image and 2D to 3D. We do so, based on their specific requirements for implementing a deep learning workflow. For more details, please see our paper.</h2>
</div>
</div>
</section>
End teaser image -->
<!-- Teaser image
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<img src="static/images/workflow.png" alt="Standard Deep Learning Workflow" />
<h2 class="subtitle has-text-centered">
Figure 1: We propose a simple workflow for developing deep learning solutions for the supported analysis of EM data.
The workflow is clustered into three categories: 1) Task; 2) Data; 3) Model</h2>
</div>
</div>
</section>
End teaser image -->
<!-- Paper abstract -->
<section class="section hero is-light">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<div class="content has-text-justified">
<p>
Despite advancements in Computer Vision (<abbr title="Computer Vision">CV</abbr>), Deep Learning (<abbr
title="Deep Learning">DL</abbr>)
application in Electron Microscopy (<abbr title="Electron Microscopy">EM</abbr>) labs remains limited.
This paper outlines
various application areas within <abbr title="Electron Microscopy">EM</abbr>, and introduces the DEEP-EM
TOOLBOX which supports the application and adaption of <abbr title="Deep Learning">DL</abbr> solutions
within <abbr title="Electron Microscopy">EM</abbr> labs.
With the help of this DEEP-EM TOOLBOX we aim to bridge the gap between <abbr
title="Deep Learning">DL</abbr> experts and <abbr title="Electron Microscopy">EM</abbr> researchers,
while acknowledging the significant potential of <abbr title="Deep Learning">DL</abbr> in enhancing the
analysis of <abbr title="Electron Microscopy">EM</abbr> micrographs. With its proven success in <abbr
title="Computer Vision">CV</abbr> tasks, <abbr title="Deep Learning">DL</abbr> can revolutionize <abbr
title="Electron Microscopy">EM</abbr> image analysis through supported, automated, and standardized
methodologies.
Our primary objective is to foster interdisciplinary collaboration between domain experts and data
scientists, addressing differences in terminology and expertise. We therefore introduce this toolbox to
compile recent advancements in <abbr title="Deep Learning">DL</abbr> for <abbr
title="Electron Microscopy">EM</abbr>.
We believe, as <abbr title="Electron Microscopy">EM</abbr> is an active and vastly changing field of
research a "one-fits-all" model is not applicable. We therefore propose to categorize possible <abbr
title="Electron Microscopy">EM</abbr> specific use cases into three tasks: Image to Value(s), Image to
Image, and 2D to 3D. We demonstrate the capabilities of the toolbox by providing three exemplary use cases
such as viral particle quantification, cellular structure segmentation, and tomographic reconstruction.
The use cases are designed for plug-and-play use by <abbr title="Electron Microscopy">EM</abbr>
researchers, such that they can be easily adapted to new data sets and requirements.
We introduce a standardized workflow to implement <abbr title="Deep Learning">DL</abbr> based solutions,
such that adaptations to the use cases are more accessible.
We encourage contributions from the research community to also make their <abbr
title="Deep Learning">DL</abbr> approaches accessible within the toolbox.
</p>
<p>More specifically, we </p>
<ul>
<li>developed a standardized workflow for implementing deep learning (DL) models for electron microscopy
(EM) data analysis, streamlining future adaptations.</li>
<li>categorized DL methods in EM use cases into three tasks: Image to Value(s), Image to Image, and 2D to
3D, enabling targeted solutions.</li>
<li>implemented three DL use cases for EM, using Lightning AI Studio with Jupyter notebooks for virus
quantification, cell structure segmentation, and tomographic reconstruction.</li>
</ul>
</div>
</div>
</div>
</div>
</section>
<!-- End paper abstract -->
<!--Intro to Deep Learning -->
<section class="section hero">
<div class="container is-max-desktop content">
<h2 class="title is-3">Deep Learning Terminology</h2>
<div class="content has-text-justified">
<p>Deep Learning has emerged as a powerful tool of artificial intelligence.
Deep Learning describes a tool, which, in theory, is able to approximate any function \( f_{\theta}(x) =
\hat{y} \), where \( x \) is some input data
(like a micrograph of a virus infected cell) and \( \hat{y} \) is the network's output. During training of the
neural network, the function's parameters \( \theta \)
(often referred to as <i>trainable parameters</i>) need to be adjusted, such that \( \hat{y} = y \), where \(
y \) is a desired output of the model (like the number of virus capsids
present in the input image. \( y \) is often called "labels", "ground truth", "target" or "annotations").
To train the network, we need to define a <i>loss function</i> \( L(\hat{y}, y) \), where \( \hat{y} =
f_{\theta}(x) \) (the network's output), which is able to measure the network's
error. Then, the parameters \( \theta \) of the network are updated using <i>gradient descent</i>. Using
gradient descent, we are aiming to minimize the predefined loss function
for a large set of training data \( x_i \in X_{i=1...N} \).
By using a large dataset with high variances in the data we aim to make the network <i>generalizable</i>,
which means that the network is able to learn a
function \( f \) which is able to map input data, which it has not seen during training to a correct
prediction.
</p>
<p>In the following, we give a short overview of most common terminology used in the context of deep learning.
</p>
</div>
<button id="togglebtn-terminology" class="button-55"
onclick="toggleVisibility('content-terminology', 'togglebtn-terminology')">Show More</button>
<div id="content-terminology" class="hidden">
<p></p>
<button id="btn-loss" class="round-button"
onclick="toggleVisibility_triangle('loss', 'btn-loss')">▶</button>
<strong>Loss function</strong>
<div id="loss" class="hidden">
<p>
is a mathematical function that quantifies the difference between the predicted
output of a neural network and the actual target value (often also referred to as <i>annotation</i>,
<i>ground
truth</i> or <i>label</i>). It serves as a crucial component in training deep learning models by providing
a
measure of how well or poorly the model is performing. The primary objective during training is to minimize
this loss function, which in turn improves the model's predictions.
</p>
</div>
<p></p>
<p></p>
<button id="btn-metric" class="round-button"
onclick="toggleVisibility_triangle('metric', 'btn-metric')">▶</button>
<strong>Metric</strong>
<div id="metric" class="hidden">
<p>In the context of deep learning, a metric is a quantitative measure used to evaluate
the performance of a model. Metrics provide insights into how well the model is performing on tasks such as
classification, regression, or other predictive tasks by comparing the model's predictions to the actual
ground truth values. Metrics help in assessing the effectiveness of the model, guiding the tuning of
hyperparameters, and making decisions about model improvements. Unlike loss functions, which are optimized
during training, metrics are primarily used for evaluation purposes, providing a clearer understanding of
the
model's predictive capabilities and generalization to unseen data.
</p>
</div>
<p></p>
<p></p>
<button id="btn-gradientdescent" class="round-button"
onclick="toggleVisibility_triangle('gradientdescent', 'btn-gradientdescent')">▶</button>
<strong>Gradient Descent</strong>
<div id="gradientdescent" class="hidden">
<p>Gradient descent is a fundamental optimization algorithm used in deep
learning to minimize the loss function. The algorithm iteratively adjusts the trainable parameters (weights
and biases) of the neural network to reduce this loss. The core idea involves computing the gradient
(partial
derivative) of the loss function with respect to each parameter. These gradients indicate the direction and
rate of change needed to decrease the loss. The parameters are then updated in the opposite direction of the
gradient, scaled by a learning rate, which controls the step size of the updates. Mathematically, the update
rule for a parameter θ at update step t is given by θ<sub>t</sub> = θ<sub>t-1</sub> - η∇<sub>θ</sub>L, where
η
is the learning rate, ∇<sub>θ</sub>L is the gradient of the loss L with respect to θ. This iterative process
continues until the algorithm converges to a minimum of the loss function, ideally reaching optimal
parameter
values that allow the neural network to make accurate predictions. Gradient descent variants, such as
stochastic gradient descent (SGD) and mini-batch gradient descent, improve efficiency and performance by
adjusting how the gradients are computed and applied.
</p>
</div>
<p></p>
<p></p>
<button id="btn-architecture" class="round-button"
onclick="toggleVisibility_triangle('architecture', 'btn-architecture')">▶</button>
<strong>Architectures</strong>
<div id="architecture" class="hidden">
<p> refer to the specific design and configuration of neural networks, dictating
how layers are arranged and interconnected. Common architectures include Convolutional Neural Networks
(CNNs)
for image processing, Recurrent Neural Networks (RNNs) for sequential data, and Transformer models for tasks
like natural language processing or image processing. Each architecture is tailored to handle specific types
of input and output dimensions, ensuring optimal processing and learning.
At the core of these architectures are neurons, the fundamental units of a neural network. A neuron receives
input, processes it using a set of weights, and then applies an activation function, such as ReLU (Rectified
Linear Unit), Sigmoid, or Tanh, to introduce non-linearity, enabling the network to learn complex functions.
Layers, which are collections of neurons, form the structural components of a neural network. There are
various types of layers, each serving a distinct purpose. For example, input layers handle the raw data,
hidden layers process the input through multiple transformations, and output layers produce the final
predictions. The architecture must also adapt the input dimensions, like the dimension of the input data,
and
the output dimensions, for example to ensure the correct number of classes in classification tasks, to suit
the problem being addressed. The thoughtful design of these architectures, the role of neurons, the
appropriate activation functions, and the strategic use of different types of layers are essential for the
network to effectively learn from the data and perform the desired tasks.
</p>
</div>
<p></p>
<p></p>
<button id="btn-hyperparameter" class="round-button"
onclick="toggleVisibility_triangle('hyperparameter', 'btn-hyperparameter')">▶</button>
<strong>Hyperparameters</strong>
<div id="hyperparameter" class="hidden">
<p>
in the context of deep learning are the parameters set before the training
process begins, which govern the overall behavior and performance of the neural network. Unlike model
parameters, which are learned during training, hyperparameters need to be manually defined. They include
aspects such as the learning rate, batch size, number of epochs, and architecture-specific choices like the
number of layers and units per layer. The choice of hyperparameters can significantly impact the model's
ability to learn effectively and generalize to new data. Tuning these hyperparameters is often a complex and
iterative process, involving techniques such as grid search, random search, or more sophisticated methods
like
Bayesian optimization to find the optimal settings that enhance model performance.
</p>
</div>
<p></p>
<p></p>
<button id="btn-training" class="round-button"
onclick="toggleVisibility_triangle('training', 'btn-training')">▶</button>
<strong>Training</strong>
<div id="training" class="hidden">
<p>
in deep learning is the process where a neural network learns from a dataset by
adjusting its weights to minimize the error of its predictions. The dataset is often too large to process
all
at once, so it is divided into smaller subsets called batches. A batch is a small, manageable portion of the
dataset used to update the model's weights. Training on batches is necessary because it allows for efficient
computation and memory usage, making it feasible to train large models on large datasets.
An iteration refers to a single update of the model's weights using one batch of data. Multiple iterations
make up an epoch, which is a complete pass through the entire training dataset. Training on batches helps
achieve a balance between speed and accuracy, as each batch update can quickly provide feedback to the
model,
allowing it to adjust its weights incrementally.
Using a batch size of 1, also known as online learning, can be inefficient and noisy. With a batch size of
1,
the model's weights are updated after every single data point, leading to highly variable gradient updates
that can make the training process unstable and slow. Larger batch sizes help in smoothing out these
updates,
providing more stable and reliable gradients, which can lead to more efficient convergence.
Throughout many epochs, the model iteratively processes batches of data, computes predictions, and updates
its
parameters using optimization algorithms such as stochastic gradient descent. The goal is to minimize a
predefined loss function that quantifies the discrepancy between the predicted outputs and the actual
targets.
By iteratively refining its weights through batch processing, the model learns the underlying patterns in
the
data effectively, leading to improved performance and generalization.
</p>
</div>
<p></p>
<p></p>
<button id="btn-learningrate" class="round-button"
onclick="toggleVisibility_triangle('learningrate', 'btn-learningrate')">▶</button>
<strong>Learning Rate</strong>
<div id="learningrate" class="hidden">
<p>
The learning rate is a critical hyperparameter that determines the step size at
each iteration while moving towards a minimum of the loss function. A learning rate that is too high can
cause
the training process to converge too quickly to a suboptimal solution, or even diverge. Conversely, a
learning
rate that is too low can make the training process very slow, potentially getting stuck in local minima.
</p>
</div>
<p></p>
<p></p>
<button id="btn-learningratescheduler" class="round-button"
onclick="toggleVisibility_triangle('learningratescheduler', 'btn-learningratescheduler')">▶</button>
<strong>Learning Rate Scheduler</strong>
<div id="learningratescheduler" class="hidden">
<p>
To address the challenges of selecting a proper learning rate,
learning rate schedulers are used. These dynamically adjust the learning rate during training to improve
performance and convergence speed. Common strategies include:
<ul>
<li><i>Step Decay</i>: Reduces the learning rate by a factor at fixed intervals (epochs).</li>
<li><i>Exponential Decay</i>: Gradually decreases the learning rate exponentially over time.</li>
<li><i>Cosine Annealing</i>: Reduces the learning rate following a cosine curve, which can help in exploring
wider regions of the loss landscape initially and then fine-tuning as training progresses.</li>
<li><i>Cyclic Learning Rate</i>: Varies the learning rate cyclically between a minimum and maximum boundary,
which can help escape local minima and improve training performance.</li>
</ul>
</p>
</div>
<p></p>
<p></p>
<button id="btn-optimization" class="round-button"
onclick="toggleVisibility_triangle('optimization', 'btn-optimization')">▶</button>
<strong>Optimization Algorithms</strong>
<div id="optimization" class="hidden">
<p>
Optimization algorithms are used to adjust the weights of the model
to minimize the loss function. Different optimizers offer various advantages depending on the problem and
the
dataset. Here are some commonly used optimizers:
<ul>
<li><i>Stochastic Gradient Descent (SGD)</i>: SGD updates the model's parameters using the gradient of the
loss function with respect to the parameters for each batch of data. It is simple and effective but can be
slow to converge and may oscillate near the minimum.</li>
<li><i>Momentum</i>: An extension of SGD, momentum helps accelerate SGD by navigating in the relevant
direction and dampening oscillations. It accumulates a velocity vector in the direction of the gradient's
consistent component, speeding up the training process.</li>
<li><i>Adagrad</i>: Adagrad adapts the learning rate for each parameter based on its gradients' historical
sum. It is particularly useful for dealing with sparse data but can suffer from decaying learning rates
over
time.</li>
<li><i>RMSprop</i>: RMSprop adjusts the learning rate for each parameter by dividing by a running average of
recent gradients' magnitudes. It mitigates Adagrad's issue of decaying learning rates and performs well in
practice.</li>
<li><i>Adam</i>: Adam (Adaptive Moment Estimation) combines the benefits of both Adagrad and RMSprop. It
computes adaptive learning rates for each parameter using the first and second moments of the gradients.
Adam is widely used due to its robust performance across various tasks.</li>
<li><i>AdamW</i>: An extension of Adam, AdamW decouples weight decay (used for regularization) from the
gradient updates. This improves the optimizer's performance, particularly when using L2 regularization.
</li>
</ul>
</p>
</div>
<p></p>
<p></p>
<button id="btn-batchsize" class="round-button"
onclick="toggleVisibility_triangle('batchsize', 'btn-batchsize')">▶</button>
<strong>Batch Size</strong>
<div id="batchsize" class="hidden">
<p>
The batch size is a crucial hyperparameter in deep learning training that defines
the number of samples processed before the model's internal parameters are updated. It influences both the
learning dynamics and computational efficiency of the training process. Choosing the right batch size
involves
balancing several trade-offs. Smaller batch sizes (e.g., 32 or 64) provide more frequent updates to the
model
parameters, which can lead to a smoother convergence and better generalization to new data. However, they
may
introduce higher noise in the gradient estimates, which can make the training process less stable. Larger
batch sizes (e.g., 256 or 512) offer more accurate gradient estimates and can leverage parallel processing
capabilities of modern GPUs more efficiently, potentially speeding up the training process. Yet, they
require
more memory and can lead to less frequent updates, which might result in slower convergence and risk of
getting stuck in local minima. Empirically, a batch size that balances these factors is typically chosen
based
on the specific dataset and computational resources available. Adaptive strategies, such as progressively
increasing the batch size during training, can also be employed to combine the benefits of both small and
large batch sizes.
</p>
</div>
<p></p>
<p></p>
<button id="btn-validation" class="round-button"
onclick="toggleVisibility_triangle('validation', 'btn-validation')">▶</button>
<strong>Validation</strong>
<div id="validation" class="hidden">
<p>
is a critical step in deep learning used to evaluate the model's performance on a
separate dataset not seen during training. This dataset, called the validation set, is used to tune
hyperparameters, select the best model, and prevent overfitting. Overfitting occurs when a model learns the
training data too well, capturing noise and specific patterns that do not generalize to new, unseen data.
This
leads to poor performance on validation or test sets. In contrast, generalization is the model's ability to
perform well on new, unseen data, indicating that it has learned the underlying patterns in the training
data
without memorizing it. During training, the model's performance on the validation set is monitored, and
adjustments are made to improve generalization. This helps ensure that the model does not just memorize the
training data but learns to generalize to new, unseen data, enhancing its robustness and applicability in
real-world scenarios.
</p>
</div>
<p></p>
<p></p>
<button id="btn-test" class="round-button"
onclick="toggleVisibility_triangle('test', 'btn-test')">▶</button>
<strong>Test</strong>
<div id="test" class="hidden">
<p>
The test phase, sometimes also refered to as inference, is where the trained model is
evaluated on a completely separate dataset called the test set. This dataset is used to assess the model's
final performance and its ability to generalize to new data. During inference, the model makes predictions
on
new data points, and its performance metrics (such as accuracy, precision, recall) are calculated. This
phase
is crucial for understanding how well the model will perform in real-world scenarios and ensures that the
model's performance is robust and reliable.
</p>
</div>
<p></p>
<p></p>
<button id="btn-supervisedlearning" class="round-button"
onclick="toggleVisibility_triangle('supervisedlearning', 'btn-supervisedlearning')">▶</button>
<strong>Supervised Learning</strong>
<div id="supervisedlearning" class="hidden">
<p>
is the standard approach of machine learning where the model is trained
on labeled data. In this paradigm, the training dataset consists of input-output pairs, where each input \(
x \)
is associated with a known output \( y \) (label). The goal of supervised learning is to learn a mapping
function
from inputs to outputs, allowing the model to make accurate predictions on new, unseen data. Supervised
learning is widely used in various domains such as image recognition, speech recognition, and medical
diagnosis, due to its effectiveness in learning from explicit examples.
</p>
</div>
<p></p>
<p></p>
<button id="btn-weaklysupervisedlearning" class="round-button"
onclick="toggleVisibility_triangle('weaklysupervisedlearning', 'btn-weaklysupervisedlearning')">▶</button>
<strong>Weakly Supervised Learning</strong>
<div id="weaklysupervisedlearning" class="hidden">
<p>
is a machine learning approach where the model is trained using
partially labeled or noisy data, as opposed to fully labeled data in traditional supervised learning. In
weakly supervised learning, the training dataset may contain only high-level labels, partial labels, or
noisy
labels, which provide limited or ambiguous information about the ground truth. Despite the challenges posed
by
the lack of precise labels, weakly supervised learning algorithms aim to infer meaningful patterns and
relationships from the available data to make predictions or perform tasks. This approach often requires
innovative techniques, such as label aggregation, data augmentation, or learning from indirect supervision
signals. Weakly supervised learning is particularly useful in scenarios where obtaining fully labeled data
is
expensive, time-consuming, or impractical, allowing models to be trained on larger, more diverse datasets.
Additionally, weak supervision can function as implicit standardization: For example when human opinion on
the
full annotation is ambiguous, the annotations in a weak scenario might be unambiguous. Hence, the model is
able to learn a standardization from the unambiguous weak labels.
</p>
</div>
<p></p>
<p></p>
<button id="btn-unsupervisedlearning" class="round-button"
onclick="toggleVisibility_triangle('unsupervisedlearning', 'btn-unsupervisedlearning')">▶</button>
<strong>Unsupervised Learning</strong>
<div id="unsupervisedlearning" class="hidden">
<p>
is particularly valuable for pretraining models on large, unlabeled
datasets. Instead of relying on labeled examples, unsupervised learning algorithms explore the raw input
data
to extract meaningful features or representations without explicit guidance. Pretraining involves training a
model on a large amount of unlabeled data to learn general patterns and structures in the data. Once
pretrained, the model can be fine-tuned on smaller labeled datasets for specific tasks, such as
classification
or regression. Fine-tuning adjusts the pretrained model's parameters to make it better suited for the
specific
task at hand, leveraging the knowledge gained during pretraining. Unsupervised pretraining followed by
fine-tuning has proven to be effective in improving model performance, especially in scenarios where labeled
data is scarce or expensive to obtain. It plays a crucial role in various applications such as natural
language processing, computer vision, and speech recognition, enabling the development of more accurate and
robust deep learning models.
</p>
</div>
<p></p>
</div>
</div>
</section>
<!--End intro to Deep Learning -->
<!--Workflow -->
<section class="section hero is-light">
<div class="container is-max-desktop content">
<h2 class="title is-3">Workflow</h2>
<div class="hero-body">
<img src="static/images/workflow.png" alt="deep learning workflow in the context of EM." />
<h6 class="subtitle has-text-centered">
Figure 1: We propose a simple workflow for developing deep learning solutions for the supported analysis of EM
data.
The workflow is clustered into three categories: 1) Task; 2) Data; 3) Model </h6>
</div>
<div class="content has-text-justified">
<p>The introduced DEEP-EM TOOLBOX follows a generalizable workflow, which we introduce in the
following.
In our workflow (Figure 1) we propose 3 clusters:</p>
<ul>
<li>Task (orange).</li>
<li>Data (green).</li>
<li>Model (red).</li>
</ul>
<p>
The standardized workflow allows easier access and realization of adaptions to the methods.
Additionally, we identify and analyse possible challenges with applying DL to EM data and
discuss how to tackle them.
</p>
<p></p>
<div class="orange-background">
<h3>Task</h3>
<p>In <abbr title="Deep Learning">DL</abbr>, a task refers to a specific problem or objective that is desired
to be addressed.
This section will outline the necessary steps for defining tasks, providing a comprehensive foundation for
effectively applying <abbr title="Deep Learning">DL</abbr> techniques to EM image analysis.
Specifically, within this paper we categorize tasks in the area of <abbr
title="Electron Microscopy">EM</abbr> into three objectives: 1) Image to Value(s), 2) Image to Image, 3)
2D to 3D.
Each task defines the nature of the data interactions and the desired outcomes, guiding the development and
training of the model to perform effectively on that particular problem.</p>
<button id="togglebtn-task-model" class="button-55"
onclick="toggleVisibility('content-task-model', 'togglebtn-task-model')">Show More</button>
<div id="content-task-model" class="hidden">
<p></p>
<h4>Definition</h4>
<p>Task definition encompasses knowledge over the type of input data the model will process, the expected
output or prediction, and the overall goal of the analysis. The type of input data in the case of <abbr
title="Electron Microscopy">EM</abbr> usually corresponds to micrographs, making it well suited for the
application of <abbr title="Deep Learning">DL</abbr> methods which originate from the area of <abbr
title="Computer Vision">CV</abbr>. The output as well as the overall goal need to be defined
individually for each task in mind. We title the introduced tasks based on the required type of input data
and the expected output.</p>
<p><b>Image to Value(s)</b>
Tasks are defined by their image input and the output of a single or multiple values. Common
examples
involve classification, regression, or detection.<br>
Classification in the context of <abbr title="Electron Microscopy">EM</abbr> refers to the process of
categorizing <abbr title="Electron Microscopy">EM</abbr> images or their specific regions into predefined
classes based on their visual characteristics. For example, it can be used to identify "good" or "bad"
imaging regions of the sample of interest [1]. This is done by making the
model predict a probability distribution which models the probability of the input image to belong to a
predefined set of classes (for example, <code>C = {"good", "bad"}</code>).
<br>Regression tasks in <abbr title="Electron Microscopy">EM</abbr> refer to a type of predictive modeling
technique used to predict a continuous output variable based on an input micrograph. Unlike
classification, which assigns discrete classes to the input data, regression outputs a continuous value.
This technique is particularly valuable for tasks that require quantifying certain properties of an <abbr
title="Electron Microscopy">EM</abbr> image, such as the number of visible virus particles.
<br>Lastly, detection refers to the process of identifying and locating specific objects or features by a
bounding box within an image. Unlike simple classification, which assigns labels to entire images, or
regression, which predicts continuous values, detection combines both tasks: it involves pinpointing exact
bounding boxes by regressing its position and size, as well as classifying the object located within the
bounding box. It allows for deriving information about position, count, and sizes of the detected objects.
This process is essential for tasks where understanding spatial distributions and feature characteristics,
such as of virus particles within a micrograph, are critical.
</p>
<p><b>Image to Image</b>
Tasks are defined by an image input as well as an output also in the form of an image. Common examples
involve transforming the input image into a new image representation. These tasks are fundamental in
various applications within <abbr title="Electron Microscopy">EM</abbr>, where enhancing, restoring, or
analyzing images is crucial for extracting valuable information from <abbr
title="Electron Microscopy">EM</abbr> data.
<br>In denoising, the noisy input image is translated into a noise-free version. In super-resolution, a
low-resolution micrograph is translated into a high-resolution micrograph, thereby enhancing the detail
and clarity of the observed structures. Lastly, in segmentation, the input micrograph is translated into a
segmented image where different regions represent distinct components, such as certain cellular organelles
or virus particles. This involves the classification of each pixel in the input micrograph. Segments,
which are formed by adjacent groups of uniformly classified pixels, are typically labeled, providing a
clear distinction between different parts of the sample.
</p>
<p><b>2D to 3D</b>
Tasks are characterized by their process of converting multiple two-dimensional (2D) images into a
three-dimensional (3D) representation. These tasks are essential in various fields, such as structural
biology and material science, where understanding the 3D structure of samples from 2D projections is
crucial. By integrating information from multiple 2D projections, these methods aim to produce an accurate
and detailed 3D representation of the sample, enhancing our understanding of its spatial organization and
functional features.
<br>Common examples correspond to <abbr title="Electron Tomography">ET</abbr>, Subtomogram Averaging, and
Single Particle Reconstruction. In the case of <abbr title="Electron Tomography">ET</abbr> and Subtomogram
Averaging, the input is defined by one or multiple tilt series. For Single Particle Reconstruction, the
input corresponds to a set of picked particles.
</p>
<h4>Model Architecture</h4>
<p>A model in the context of <abbr title="Deep Learning">DL</abbr> is a learnable function approximation
based on a predefined set of trainable parameters (often also referred to as "model weights") and
non-linear activation functions. The term "model" is often used interchangeably with the term "neural
network" or "<abbr title="Deep Neural Network">DNN</abbr>". The learned function of a model is able to
approximate the input-output dependencies of a set of training data.<br>
How well a function can be approximated usually depends on the model's architecture and number of
trainable parameters. This is often referred to as the "capacity" of the model. Furthermore, a model's
ability to generalize—i.e., to apply learned knowledge from the training data to new, unseen data—reflects
its effectiveness and robustness. This then supports the automated analysis and interpretation of input
<abbr title="Electron Microscopy">EM</abbr> images for tasks such as those introduced previously.
</p>
<p><b>Backbone</b>
The backbone of a model is the core component responsible for feature extraction and processing from the
input data. It acts as the core architecture upon which a model is built, enabling the extraction of
meaningful representations that can be used to perform specific tasks. The selection of a backbone is
primarily determined by the nature of the input data and the type of task at hand.
<br>Furthermore, the choice of backbone must balance the parameter-to-data ratio. When working with
limited
datasets, it is crucial to use a model with a capacity that matches the amount of available data to avoid
overfitting. Overfitting occurs when a model learns the training data too well, capturing noise and
specific patterns that do not generalize to new, unseen data. To mitigate this, the model's complexity
should be controlled according to the data's size and quality.
<br>Different backbone architectures, such as <abbr title="Multilayer Perceptron">MLP</abbr>, <abbr
title="Convolutional Neural Network">CNN</abbr>, and <abbr title="Vision Transformer">ViT</abbr>, offer
distinct advantages based on the data characteristics and computational constraints. Each type of backbone
is designed to handle specific aspects of data processing, making the choice of backbone a critical factor
in the overall model performance.
<br><em>MLPs</em> are primarily used for integrating structured or tabular data and are not typically
suited
for processing image data due to their high computational demands and risk of overfitting. They are,
however, effective as classification heads when combined with different feature extractors.
<br><em>CNNs</em> are specifically designed for processing image data. They excel at capturing local
features
and are invariant to translations in image space. Therefore, they are particularly effective for tasks
involving spatial relationships within images, making them ideal for <abbr
title="Electron Microscopy">EM</abbr> image analysis.
<br>Lastly, while Transformers were originally developed for natural language processing, they were
adapted
into so-called <em>ViTs</em> to handle image data by processing image patches through self-attention
mechanisms. They are effective at capturing global contexts and are well-suited for large datasets, where
they usually outperform standard <abbr title="Convolutional Neural Network">CNN</abbr> architectures.
Variants like Swin Transformers [2] and Data-efficient Image Transformers (DeiTs)
[3] offer additional improvements for specific tasks.
</p>
<p><b>Task Specific</b>
The task-specific architecture of a model encompasses the design and arrangement of its components,
including the types of layers, their arrangement, and the activation functions used, all of which define
how the model processes input data to generate output.
<br>Different tasks within the realm of <abbr title="Deep Learning">DL</abbr> necessitate the use of
tailored
architectures to effectively address the unique challenges posed by each task category. This ensures the
model is capable of accurately interpreting and processing the data to produce meaningful and reliable
outcomes. Here, the aforementioned three task groups of Image to Value(s), Image to Image, and 2D to 3D
have a significant impact on the model's architecture.
<br><em>Image to Value(s)</em> tasks mainly follow the use of a feature encoder (backbone) and a
task-specific prediction head to tailor the features extracted by the backbone to better suit the specific
task at hand. This involves transforming the abstract, high-level features into task-relevant outputs.
<br><em>Image to Image</em> tasks generally utilize encoder-decoder architectures, like U-Net
[4]. The encoder's role is to process the input <abbr title="Electron Microscopy">EM</abbr> images and
compress them into a lower-dimensional, abstract
representation within the embedding/latent space. The decoder takes the compressed representation from the
encoder and reconstructs it back to the original spatial dimensions.
<br>Finally, <em>2D to 3D</em> tasks are inherently complex due to their diverse interpretations and
approaches. They can be interpreted and solved in different fashions, such that some approaches leverage
architectures similar to Image to Value(s) task architectures and others leverage Image to Image-like
architectures. One method [5] involves using optimization grids where no actual
<abbr title="Deep Neural Network">DNN</abbr> is employed; instead, the 3D reconstruction is directly
optimized. Another approach utilizes scalable data structures for efficient 3D representation
[6]. Additionally, some techniques employ standard <abbr title="Multilayer Perceptron">MLP</abbr>-based
architectures to estimate the density at specific
positions in a 3D grid [7,8]. Alternatively, other
methods
adapt Image to Image models, using a 3D decoder to reconstruct the 3D model from the encodings of multiple
2D images.
</p>
<p>By meticulously defining tasks and selecting appropriate model architectures, researchers can optimize
<abbr title="Deep Learning">DL</abbr> applications for electron microscopy and other advanced imaging
techniques.
</p>
<p></p>
<p><small><i>[1] Yuichi Yokoyama, Tohru Terada, Kentaro Shimizu, Kouki Nishikawa, Daisuke Kozai, Atsuhiro
Shimada, Akira Mizoguchi, Yoshinori Fujiyoshi, and Kazutoshi Tani. Development of a deep
learning-based method to identify “good” regions of a cryo-electron microscopy grid. Biophysical
Reviews, 12:349–354, 2020.</i></small></p>
<p><small><i>[2] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining
Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of
the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.</i></small></p>
<p><small><i>[3] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and
Herv´e J´egou. Training data-efficient image transformers & distillation through attention. In
International conference on machine learning, pages 10347–10357. PMLR, 2021</i></small></p>
<p><small><i>[4] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for
biomedi-
cal image segmentation. In Medical image computing and computer-assisted intervention–MICCAI
2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III
18, pages 234–241. Springer, 2015.</i></small></p>
<p><small><i>[5] Animesh Karnewar, Tobias Ritschel, Oliver Wang, and Niloy Mitra. Relu fields: The little
non-
linearity that could. In ACM SIGGRAPH 2022 conference proceedings, pages 1–9, 2022</i></small></p>
<p><small><i>[6] Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees
for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, pages 5752–5761, 2021</i></small></p>
<p><small><i>[7] Hannah Kniesel, Timo Ropinski, Tim Bergner, Kavitha Shaga Devan, Clarissa Read, Paul
Walther, Tobias Ritschel, and Pedro Hermosilla. Clean implicit 3d structure from noisy 2d
stem images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern