generated from 360-info/quarto-scaffold
-
Notifications
You must be signed in to change notification settings - Fork 0
/
report_html_format.Rmd
1543 lines (1189 loc) · 60.1 KB
/
report_html_format.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "ETC5543 - Business analytics creative activity - S1 2023"
subtitle: "Internship Report"
author: "Nishtha Arora"
date: "2023-06-09"
output:
bookdown::html_document2:
base_format: rmdformats::readthedown
keep_md: no
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(cache = FALSE, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 4)
```
```{r libraries, include=FALSE}
library(rmdformats)
library(readr)
library(themes360info)
library(readxl)
library(tidyverse)
library(hrbrthemes)
library(ggthemes)
library(lubridate)
library(sf)
library(rgeos)
library(pdftools)
library(ggthemes)
library(scales)
library(dumbbell)
library(RColorBrewer)
library(DiagrammeR)
library(kableExtra)
```
# Abstract {-}
For decades, computer based reporting has been an integral part of journalism, that uses public records, databases, private and public data sources to investigate patterns, trends or even anomalies in the data collected. The integration of data analysis in the reporting industry brings challenges with it, i.e., data manipulation, wrangling, access to platforms supporting visualization reorganization, etc.(Halevy & McGregor, 2012). The aim of the project is to support the journalism team of authors and editors with compelling visualization to support their claims and research, or creation of an analysis via a visual related to the topic selected. The first package, **'Decriminalizing Suicide'** focuses on various aspects covered by the authors, one of them is 'India's Mental Health Act 2017' and opposite results are observed due to different data sources, an increase and decrease after 2017, which is discussed in the report. The second package, **'Policing he Police'** takes a generic angle, covering topics of shootings around the world, the changing trust in police and much more. The project uses methods of data wrangling, exploration, pdf scraping, spatial analysis and basics of functions of tidyverse, in R language and uses 'themes360info' package for the theme. The report is divided into four main sections, i.e., Introduction, Aim, Methodology and Results/Learnings. The workflow for both the selected packages, starts with initial analysis for the topic, the shortlisted/selected visualizations with 360info theme added, reasons for selection or rejection of a particular visualization and the challenges faced during the process.
# About {-}
[*360info*](https://360info.org) is a not-for-profit open access agency that provides global information regarding world's issues and provides solutions for the same. This content is forwarded to re-publishers without charge, under Creative Commons.
The content published is based on research, and each week a special report is published, focusing on a global problem, which consists of 5-10 articles covering different aspects in the problem. These articles are contributions from academics across various fields of study, depending on the article.
Each report is supported by visuals, can be images, graphics or interactive. Any story telling can be made better with a data-driven analysis along with it and hence, this internship has given me a chance to work in the data and digital story telling team, produce data visualizations, collaborating with the authors and editors.
All the published work is reproducible for media partners and is published under Creative Commons licences, which is good for art, educational and entertainment works. 360info uses Creative Commons attribution 4.0 because it allows the user's rights under the licence to be reinstated, if the user comes in compliance within 30 days of discovering that they were in violation of the rights.
Special thanks for the guidance, to the mentors for the project, Mr. Damjan Vukcevic, Associate Professor, Monash University, Australia and Mr. James Goldie, Data and Digital Story-Telling Lead, 360info.org, Monash Univeristy, Caulfield, Australia.
```{r image1, fig.align='right', out.width="20%", echo=FALSE, fig.pos="bottom"}
knitr::include_graphics("images/images.png")
```
# Background and motivation
**Suicide** is a worldwide public health problem. There have been over 700,000 deaths from suicide worldwide in 2019. Overtime, there have been a no. of theories if decriminalizing suicide is a boon or a bane? Will it increase the suicide rate or decrease it? It may decrease the overall rates because then people will start talking about it openly, which will improve mental health and therefore, less suicides, or it may increase the attempt to suicide rate. According to WHO, there are still 20 countries that have criminalized suicide (World Health Organization: WHO, 2021).
```{r image2, fig.align='centre', out.width="50%", echo=FALSE}
knitr::include_graphics("images/26119-gettyimages-145903640---2_source_file.jpg")
```
The British common law stated that one has no right to take his/her life as it belong to the state and this affected many former British colonies like Kenya, who still criminalize suicide, even after the colonization ended. The Christian Commandment of 'Thou shall not kill' signifies that one should not kill himself/herself as will. And Suicide is a sin under the Sharia Law, under the Islamic Tradition (Ochuku et al., 2022).
With the advancement of science in the 19th and 20th century, it was discovered that suicidal tendencies are caused by biological factors as well and hence, continents like Europe and North America revoked the laws regarding criminalize suicide. Further, as the years went by, and awareness increased, lot of policies came into action, like Convention on the Rights of Persons with Disabilities and World Health Organization Mental Health Action Plan 2020--2030 prompted various countries to decriminalize suicide
Suicides are are result of no. of causes, ranging from abuse victims, loss, loneliness, use of intoxicants to financial issues. All these issues result in mental breakdown and it is safe to say that all potential suicide victims go through a mental health issue, it might not be true vice versa. These mental health issues come with stress, anxiety or depression and often times are linked to suicidal feelings or behaviour and might not be the only cause of suicide. The relationship between mental health and suicides is complex.
Theories like 'criminalizing suicide prevents people from reaching out for help which results in an increase in suicide rate' or 'criminalizing suicide would decrease the attempts made to suicide and hence, lower the suicide rate', are up for debates.
The concept of **Policing the Police** has been emerging recently because of historic law enforcement officers not caring about and allowing misconduct by the police, due to less resources and external power. There was no check kept on the Police, which lead to a no. of reforms and protests by the police. An example of this is the death of George Floyd, an African-American man murdered by a police officer in Minneapolis, Minnesota, over Floyd being a suspect for using a counterfeit twenty-dollar bill.
```{r, image2_b, fig.align='centre', out.width="50%", echo=FALSE}
knitr::include_graphics("images/ezgif.com-webp-to-jpg copy.jpg")
```
Police is responsible for our safety and we rely on them for protection. But it is not everytime, that thye can be trusted nowadays. 'Police accountability' is up for debates around the world. With about 1440 cases recorded against police within England and Wales during 2019-2020, 3.4% complains against law enforecemnt officers involving racism and discrimination in Australia and mass shooting in the US, concern regarding police integrity check and punishments are being discussed around the world.
There are certain plans and implementations that are being enforced for the process like independent investigations per officer, body cameras that provide proof of the misconduct, public surveys and strickter punishments. It is being stated that those who are supposed to protect us, must also be overseed by a body to keep actions of law in check.
There are debates about increasing reinforcements on the police, to improve trust and accountability which some may argue that increasing oversight on the police, may bring down the police morale and may affect police efficiency.
Here, is is very important to discover, what all are the instances where the police is involved and needs to be checked. Also, it is important to discover the factors that might lead to police misbehaving, whether it is in the corruption department, killings, etc.
# Objectives and Significance
```{r image3, echo = FALSE, fig.show = "hold", out.width = "50%", fig.align = "default"}
knitr::include_graphics(
c(
"images/whatsapp-image-2019-01-29-at-13-46-16.jpeg",
"images/CAIR-Who-is-policing-the-police-sign-1.jpg"
)
)
```
For every one completed suicide, 20 more attempts are made. Identification of potential suicide victims via these attempts can result in help-seeking and prevention of suicide but criminalizing it hinders the help-seeking and also results in inaccurate tracking of suicides.
For the instances and issues addressed above regarding police misconduct, the significant of this issue is vital for the growth and well being of the world. It is important to find out the relation between countries status and police mishapps, i.e. how the financial or political history may or may-not affect the occurrences in that area. Are there any patters seen over the years for a particular area in the world? Can religion be a factor in this? There are so many questions that can be answered with help of data, for a better result.
These results are important to be tackled with. On the basis of this there can be better reforms, bills and laws passed that could assure transparency, police-in-check laws. Areas of improvement can be targetted and the use of force can be monitored around the world accordingly.
Hence, the objective of this project are:
- To perform need analysis for the package 'Decriminalizing Suicide', i.e., work with the authors to make their articles stronger with statistical proof.
- To identify factors affecting and aspects related to 'Policing the Police' and work on a generic visualization, giving an idea about how things have changed overtime.
- To discover differences amongst different data sources and research to select the aligning/trusted source.
- Also, to tackle the data gaps and anomalies.
There are data gaps in data round the world for a particular year, years or season which could be a result of no. of factors like, a change in the government, a sudden technology advancement, a low economy country, major events, etc.
# Methodology
Each package, i.e. 'Decriminalizing Suicide' and 'Policing the Police' would have a special report, which would contain about 8-9 articles covering different topic aligning to the package.
This project allowed us to stick to 'Static Plots' and not interactions, as for interactives to be published, plotly isn't the best tool, and javascript is preferred, which we would have had to get comfortable with, but due to time constraints and prioritizing the aim of the project, 'Static' worked for the best.
- **Step 1:** Creating initial visualizations, aligning with specific draft articles.
- **Step 2:** More relevant plots were made and shortlisted where 360themes was added.
- **Step 3:** Plot with corrected flaws was made for one of the short-listed plots. Save this plot in .png.
- **Step 4:** Create a renv.lock file using capsule package and save it to Github, for example:
```{r capsule, fig.align='left', out.width="90%", echo=FALSE}
knitr::include_graphics("images/1.png")
```
**Process carried out by Data and Story-Telling Lead, after finalizing the visualization**
- **Step 5:** To transfer the image to the code of the publication directly, .png file (saved) would be considered.
- **Step 6:** To reproduce the code, the renv.lock file (created above) is used by using the following code (installing renv package and renv::restore to install the same R packages used by interns in their project)
```{r renv, fig.align='left', out.width="90%", echo=FALSE}
knitr::include_graphics("images/2.png")
```
- **Step 7:** knit the project as usual.
# Data, Results and Discussion
## Decrimanalizing Suicide
### Initial Visualizations
#### Visualization 1: A generic visualization for the package [*Decimanlizing suicide- Crimanlizing suicide only makes it worse*](https://360info.org/criminalising-suicide-only-makes-it-worse/).
Data source:
- [*World wide suicide rates: Our World in Data*](https://ourworldindata.org/suicide)
- [*Human Development Index: UNDP*](http://hdr.undp.org/en/composite/HDI)
HDI (Human Development Index) is a statistic composite of life expectancy, mean years of schooling, expected years of schooling and per capita income. These indicators are used to classify countries into four tiers of human development.
```{r class, echo=FALSE}
classification_hdi <- data.frame(
HDI_Value_2019 = c ("0.8-1", "0.7-0.8", "0.55-0.7", "0-0.55"),
Development_Index = c("Very High HDI", "High HDI", "Medium HDI", "Low HDI")
)
kable(classification_hdi, caption = "Range of HDI ranks for Human Developent 2020", booktabs = T) |>
kable_styling(
latex_options = c("striped", "hold_position"),
full_width = T,
position = "center",
font_size = 12,
fixed_thead = T
)
```
```{r cleaning, echo=FALSE}
read_data <- function(range, development_status) {
read_excel("data/InitialA1/HDIstatus.xlsx",
range = range,
col_names = FALSE) |>
rename(
HDI_rank_2021 = ...1,
Country = ...2,
HDI_Value = ...3,
Life_expectancy = ...5,
Expected_years_of_schooling = ...7,
Mean_years_of_schooling = ...9,
GNI_per_capita = ...11,
GNI_rank_minus_HDI_rank = ...13,
HDI_rank_2020 = ...15
) |>
select(
HDI_rank_2021,
Country,
HDI_Value,
Life_expectancy,
Expected_years_of_schooling,
Mean_years_of_schooling,
GNI_per_capita,
GNI_rank_minus_HDI_rank,
HDI_rank_2020
) |>
mutate(Degree_of_Human_Development = development_status)
}
very_high_hdi <-
read_data("A9:O74", "VERY HIGH HUMAN DEVELOPMENT")
high_hdi <-
read_data("A76:O128", "HIGH HUMAN DEVELOPMENT")
medium_hdi <-
read_data("A130:O166", "MEDIUM HUMAN DEVELOPMENT")
low_hdi <-
read_data("A168:O200", "LOW HUMAN DEVELOPMENT")
hdi <- bind_rows(very_high_hdi, high_hdi, medium_hdi, low_hdi)
write_csv(hdi, "data/InitialA1/HDIStatus2.csv")
tidied_hdi_data <- read_csv("data/InitialA1/HDIStatus2.csv") |>
rename("Entity" = Country)
```
```{r reading data_suicides, echo=FALSE}
suicide_rates <- read_csv("data/InitialA1/suicide-death-rates.csv")
# setdiff(data2$Entity, data3$Entity)
# rename countries to align with other data set
change <- suicide_rates |>
mutate(
Entity = recode(
Entity,
"American Samoa" = "Samoa",
"Bolivia" = "Bolivia (Plurinational State of)",
"Brunei" = "Brunei Darussalam",
"Cape Verde" = "Cabo Verde",
"Cote d'Ivoire" = "Côte d'Ivoire",
"Democratic Republic of Congo" = "Congo (Democratic Republic of the)",
"Eswatini" = "Eswatini (Kingdom of)",
"Iran" = "Iran (Islamic Republic of)",
"Laos" = "Lao People's Democratic Republic",
"Micronesia (country)" = "Micronesia (Federated States of)",
"North Korea" = "Korea (Republic of)",
"Northern Ireland" = "Ireland",
"Palestine" = "Palestine, State of",
"Russia" = "Russian Federation",
"South Sudan" = "Sudan",
"Syria" = "Syrian Arab Republic",
"Timor" = "Timor-Leste",
"Venezuela" = "Venezuela (Bolivarian Republic of)",
"Vietnam" = "Viet Nam"
)
)
# calculate the average death rate over 2008 to latest
selected_suicide_rates <- change |>
filter(Year > 2007) |>
group_by(Entity) |>
summarise(avg_rate = mean(`Deaths - Self-harm - Sex: Both - Age: Age-standardized (Rate)`))
join <- full_join(selected_suicide_rates, tidied_hdi_data) |>
arrange(desc(avg_rate)) |>
na.omit() |>
rename(`Development status` = Degree_of_Human_Development) |>
mutate(avg_rate = round(avg_rate, digits = 2))
plot1 <- join |> head(30)
```
```{r plot1, echo=FALSE, fig.width= 7, fig.height=5, fig.cap="Top 30 countries by average suicide rate (2008-2019)"}
ggplot(plot1,
aes(
x = reorder(Entity, -avg_rate),
y = avg_rate,
fill = `Development status`
)) +
geom_col() +
geom_text(
aes(label = avg_rate),
vjust = 2,
colour = "white",
size = 1.5
) +
theme(
legend.position = "bottom",
axis.text.x = element_text(
angle = 45,
vjust = 0.5,
hjust = 1
),
plot.title = element_text(face = "bold"),
plot.background = element_rect(fill = "#B2E3FF")
) +
labs(
x = substitute(paste(bold("Country"))),
y = substitute(paste(bold(
"Suicide rate per 100,000 people"
)))) +
scale_fill_manual(values = c("#A0331C", "#1C56A0", "#4B902F", "#635A61"))
```
The visualization above selects the average highest suicide rates per 100,000 people, for the years 2008-2019 and plots them with corresponding countries. This is then compared with the Human Development Index Status of the country.
Reasons for REJECTION:
- 2 data sets used from different data sources, UNDP (HDI data) and OWID (Suicide rates).
- The HDI data has recordings from 2021 values (2020 rank) and OWID had available data only till 2019.
- The status of all countries is not visible and can tbe shown due to limitation of visible sight.
#### Experiment for further visualizations.
For further visualizations, a more legit data source was recommended, and hence a global suicide rates data was extracted from [*World Health Organization*](https://www.who.int/data/gho/data/themes/mental-health/suicide-rates) which contains global data of suicide rates from 2000 to 2019..
```{r who_data, echo=FALSE}
who_data <- read_csv("data/InitialA1/data-2.csv") |>
filter(Dim1 == "Both sexes") |>
select(SpatialDimValueCode, Location, Period, Dim1, FactValueNumeric)
```
**Comparison of data from OWID and WHO by selecting a random country, say Australia.**
```{r comaprison, echo=FALSE}
Australia_WHO <- who_data |>
filter(Location == "Australia") |>
select(Period, FactValueNumeric) |>
rename(suicide_rate_WHO = FactValueNumeric,
Year = Period)
Australia_OWID <- change |>
filter(Entity == "Australia") |>
filter(Year > 1999) |>
select(Year,
`Deaths - Self-harm - Sex: Both - Age: Age-standardized (Rate)`) |>
rename(suicide_rate_OWID = `Deaths - Self-harm - Sex: Both - Age: Age-standardized (Rate)`)
join_india <-
full_join(Australia_OWID, Australia_WHO, by = "Year") |>
group_by(Year) |>
arrange(desc(Year)) |>
head(10)
kable(join_india, caption = "Comparison of Data Sources- Australia's Suicide rate", booktabs = T) |>
kable_styling(latex_options = c("striped", "hold_position"),
font_size = 14) |>
column_spec(3, color = "#07034D") |>
column_spec(2, color = "#023707")
```
It is observed that the WHO data values are higher than OWID values for the years 2014 and after, and lower for the years 2010-2014.
#### Visualization 2: This visualization observes data gaps and reduncies in the dataset and was to be paired up with [*What a suicide database registery should look like*](https://360info.org/what-a-suicide-registry-database-should-look-like/)
Data source: [*Global Suicide Rates WHO*](https://www.who.int/data/gho/data/themes/mental-health/suicide-rates)
Here, the objective of the visualization is to confirm significant errors in any data and why any data source cannot be fully trusted. This is done by observing outliers in the data set. *Stephen Hawkins described Outliers as a point that deviates so much from the other observations that it arises a suspicion about a different mechanism being used for its generation*(G, 1987).
These data points vary differently and could be due to no. of reasons, for example, variability in measurement, hampering of data, misreporting, under reporting, duplication, sampling errors, unusual events, human errors of recording incorrect data or miskeyed upon data entry, etc.
Outliers are highly underestimated! A small proportion of outliers can affect a simple analysis, giving rise to inflated error rates and distortions in statistical estimates and removal of these can help improve the accuracy significantly(Osborne & Overbay, 2004).
Here, initially the complete data set was observed for observing outliers, but due to it being a large data set, text overlapping and squeezed observations made the visualization hard to read, hence, countries with significant outliers were selected for visualization.
```{r plot2, echo=FALSE, fig.height=5, fig.width=7, fig.cap="Irregularity in Data Collection"}
viz4_who_data <- who_data |> filter(
Location %in% c(
"Kiribati",
"Central African Republic",
"Latvia",
"Republic of Korea",
"Fiji",
"Rwanda",
"Poland",
"Comoros",
"Uzbekistan",
"Bosnia and Herzegovina",
"Grenada",
"Niger",
"Cuba",
"Equatorial Guinea",
"Burkina Faso",
"Samoa",
"Latvia",
"Sao Tome and Principe",
"Honduras",
"Lebanon",
"Maldives",
"Bahamas",
"Timor-Leste",
"Iraq",
"Dominican Republic",
"Iran (Islamic Republic of)",
"Brazil",
"Bolivia (Plurinational State of)",
"The former Yugoslav Republic of Macedonia",
"Portugal",
"Belize",
"Serbia",
"Mali",
"Argentina",
"United Republic of Tanzania",
"Democratic People's Republic of Korea"
)
)
viz4_who_data |>
ggplot(aes(x = Location, y = FactValueNumeric, fill = Location)) +
geom_boxplot() +
theme(legend.position = "none") +
scale_fill_viridis_d(alpha = 0.6) +
theme(
text = element_text(size = 8),
axis.text.x = element_text(angle = 45, hjust = 1),
axis.title = element_text(face = "bold"),
plot.title = element_text(
size = 14,
lineheight = 8,
face = "bold"
),
plot.background = element_rect(fill = "#B2E3FF")
) +
labs(x = "Country",
y = "Suicide rate")
```
Reasons for REJECTION:
- A box plot maybe the best way to show data gaps but is not an easy-to-read plot for the public.
- All countries were not covered, only the ones with significant outliers have been shown.
#### Visualization 3: Trend of suicide rates in South Asian countries. The suicide rates in South Asian countries are reported to be between 0.43 to 331.0 per 100,000 population, which is high compared to the world average.
This could be paired with any of the article with a mention of a South Asian country, for example, Malaysia in [*Suicide is not a crime*](https://360info.org/suicide-is-not-a-crime/), Pakistan in [*With suicide not a crime, the real work begins*](https://360info.org/with-suicide-no-longer-a-crime-the-real-work-begins/), Bangladesh in [*Suicide is a mental health issue, not a crime*](https://360info.org/suicide-is-a-mental-health-issue-not-a-crime/) and a discussion on [*India's Mental Health act*](https://360info.org/how-india-continues-to-punish-those-who-attempt-suicide/). Sri Lanka is also mentioned in [*The alternatives that can help prevent suicide*](https://360info.org/the-alternatives-which-can-help-prevent-suicide/).
Data source: [*WHO*](https://www.who.int/data/gho/data/themes/mental-health/suicide-rates)
```{r viz3_data, include=FALSE}
#code only to spot and remove outliers
south_asian_c <- who_data |>
filter(
Location %in% c(
"India",
"Maldives",
"Afghanistan",
"Nepal",
"Bangladesh",
"Bhutan",
"Sri Lanka",
"Pakistan"
)
)
south_asian_c_clean <- south_asian_c |>
slice(-c(157))
```
```{r plot3, echo=FALSE, fig.height=5, fig.width=7, fig.cap="South Asian Countries"}
ggplot(south_asian_c_clean,
aes(colour = Location, y = FactValueNumeric, x = Period)) +
geom_line() +
geom_point() +
ylab("Suicide rate") +
scale_colour_brewer(type = "seq", palette = "Dark2") +
theme(plot.background = element_rect(fill = "#B2E3FF")) +
theme_classic()
```
Reasons for REJECTION:
- The interface for all articles is different, so it did not make sense to put this on one page or front page.
- Here, only Sri Lanka trend seems interesting but is inconsistent and did not relate to the article content.
### Shortlisted Visualizations
#### Visualization 4: A time-series plot depicting suicide rate trend before and after 2017, i.e. to pair up with the article on [*India's Mental Health Act 2017*](https://360info.org/how-india-continues-to-punish-those-who-attempt-suicide/).
Data source : [*WHO*](https://www.who.int/data/gho/data/themes/mental-health/suicide-rates)
```{r plot4, out.width='90%', echo=FALSE, fig.cap="SUICIDE IN INDIA"}
india_suicide_rates <- who_data |> filter(Location == "India") |>
filter(Period > 2009) |>
mutate(year = as.Date(as.character(Period), format = "%Y"),
year = year(year))
plot_india <-
ggplot(india_suicide_rates, aes(x = year, y = FactValueNumeric)) +
geom_line(color = "brown") +
geom_point(color = "brown", size = 1.5) +
geom_vline(xintercept = 2017, linetype = "dashed") +
labs(
x = NULL,
y = "Rate per 100,000 people",
title = "SUICIDE IN INDIA",
subtitle = "Suicide rates in India have declined from 2010 to 2017 and then a sudden hike is observed",
caption = paste(
"**CHART:** Nishtha Arora & James Goldie, 360info",
"**DATA:** Our World in Data",
sep = "<br>"
)
) +
scale_x_continuous(breaks = 2010:2021) +
ylim(9, 16) +
theme_360() +
theme(
legend.position = "none",
axis.title = element_text(face = "bold"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()
) +
annotate_360_light(
x = 2016.9,
y = 19,
label = paste(
"Suicide rates slightly increased",
"after the introduction of the ",
"Mental Healthcare Act in 2017.",
sep = "<br>"
) ,
hjust = 1,
size = 5
)
save_360plot(plot_india, "graphs/indiatimeseries.png")
knitr::include_graphics("graphs/indiatimeseries.png")
```
Reason for REJECTION:
- The author's article did not align with the results and relied more on NCRB (National Crime Records Bureau) data. So later a plot with NCRB data was made.
#### Visualization 5: A state wise India's suicide rate to pair with the article [How India contunues to punish those who attempt suicide.](https://360info.org/how-india-continues-to-punish-those-who-attempt-suicide/).
Data source:
- [*Data.gov*](https://data.gov.in/catalog/stateut-wise-distribution-suicides-causes)
- [*Geometery data at Diva-gis*](http://www.diva-gis.org/datadown)
```{r data_correction, echo= FALSE}
options(scipen = 999)
region_cases_2019 <-
read_csv("data/InitialA1/RS_Session_253_A_211.1.csv") |>
filter(str_detect(`State/UT`, "Total ", negate = TRUE)) |>
rename(wrong_total = Total) |>
rowwise() |> # total cases, wrong data
mutate(Total = sum(Male, Female, Transgender)) |>
select(`State/UT`, Total)
states_shape_sf <- read_sf("data/InitialA1/IND_adm/IND_adm1.shp")
#Correcting the data by manually looking at Id's as geometry was matching with wrong id's.
df_newid = data.frame(
id = c(
2,
3,
4,
5,
7,
11,
12,
13,
14,
15,
16,
17,
18,
20,
21,
22,
23,
24,
25,
26,
28,
29,
30,
31,
32,
33,
34,
35,
36,
1,
6,
8,
9,
10,
19,
27
)
)
join_newid <- cbind(region_cases_2019, df_newid) |>
mutate(id = as.numeric(id))
states_merged <- inner_join(states_shape_sf, join_newid,
by = c("ID_1" = "id"))
```
```{r plot5, echo=FALSE, out.width='90%', warning=FALSE, message=FALSE, fig.cap="REGION-WISE: SUICIDE IN INDIA 2019"}
colors <- c('#C6B7F7', "#744BF7", "#6B96EC", "#103E99")
b <- c(0, 1000, 10000, 18000)
india_map <- ggplot() +
geom_sf(
aes(fill = Total),
data = states_merged,
color = "black",
linewidth = 0.25
) +
geom_sf_text(
data = states_merged,
aes(label = NAME_1),
size = 3,
color = "black",
fontface = "bold"
) +
coord_sf() +
scale_fill_fermenter(
palette = "YlGnBu",
direction = -1,
# trans = "log10",
labels = scales::label_number_si()
) +
labs(
title = "REGION-WISE: SUICIDE IN INDIA 2019",
subtitle = "No. of suicides were maximum in Andhra Pradesh and Arunachal Pradesh",
caption = paste(
"**CHART:** Nishtha Arora & James Goldie, 360info",
"**DATA:** Our World in Data",
sep = "<br>"
)
) +
theme_360() +
theme(
axis.title = element_text(face = "bold"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()
) +
xlab(NULL) +
ylab(NULL)
save_360plot(india_map, "graphs/indiamap.png")
knitr::include_graphics("graphs/indiamap.png")
```
Reasons for REJECTION:
- There was overlapping of states on the Map and removing the overlaps, would remove data.
- 2 data sources were used, one for the rates and the other for state geometry. And because of the "id" column being different for different states in geometry data when compared with the "id" column in rates data, the id column had to be renamed manually - not a good practice.
### Selected Visualization
#### Visualization 6: Comparison of visualization 4 with the similar plot made from National Crime Records Bureau extracted data.
This is a time series plot based on NCRB data that visualizes suicide rate in India over time and highlights 2017 (India's Mental Health Act 2017)
Data source:
- [*NCRB*](https://ncrb.gov.in/sites/default/files/adsi_reports_previous_year/Table%202.1.pdf)
```{r read_pdf, echo=FALSE, warning=FALSE, message=FALSE}
ncrb_pdf <-
pdftools::pdf_text(pdf = "https://ncrb.gov.in/sites/default/files/adsi_reports_previous_year/Table%202.1.pdf") |>
str_split("\n")
```
```{r pdf_extraction, echo=FALSE, warning=FALSE, message=FALSE}
ncrb_pdf <-
pdftools::pdf_text(pdf = "https://ncrb.gov.in/sites/default/files/adsi_reports_previous_year/Table%202.1.pdf") |>
str_split("\n")
for (i in 1) {
#sets the iteration to go through all 17 pages
ncrb_pdf[[i]] <- ncrb_pdf[[i]][11:41]
} |>
str_squish()
numbers_ex = list()
k = 1
for (i in 1) {
numbers <- ncrb_pdf[[i]]
numbers_df <- data.frame(numbers)
while (k <= 1000) {
numbers_ex[[k]] <- numbers_df
k <- k + 1
break
}
NH_numbers <- dplyr::bind_rows(numbers_ex)
}
new <- NH_numbers |>
separate(
numbers,
into = c("extra", "id", "year", "count", "population", "rate"),
sep = "\\s+"
) |>
na.omit() |>
select(year, rate, count) |>
mutate(
year = str_remove(year, "[#@$]"),
# year = as.Date(paste0(year, "-07-01")),
count = as.numeric(count),
rate = as.numeric(rate),
year = as.numeric(year)
)
```
```{r plot6, echo=FALSE, out.width='90%', warning=FALSE, message=FALSE, fig.cap="SUICIDE IN INDIA"}
plot_ncrb <- ggplot(new,
aes(x = year, y = rate)) +
geom_line(color = "brown") +
geom_point(color = "brown", size = 1.5) +
geom_vline(xintercept = 2017, linetype = "dashed") +
labs(
x = NULL,
y = "Rate per 100,000 people",
title = "SUICIDE IN INDIA",
subtitle = "Suicide rates in India have rapidly fallen till 2016.",
caption = paste(
"**CHART:** Nishtha Arora & James Goldie, 360info",
"**DATA:** NCRB",
sep = "<br>"
)
) +
scale_x_continuous(breaks = 2010:2021) +
ylim(9, 16) +
theme_360() +
theme(
legend.position = "none",
axis.title = element_text(face = "bold"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()
)
# annotate_360_light(
# x = 2015,
# y = 10.2,
# label = paste("Suicide rates have rapidly increased after 2017.",
# sep = "<br>") ,
# hjust = 1,
# size = 5
# )
save_360plot(plot_ncrb, "graphs/indiatimeseries2.png")
knitr::include_graphics("graphs/indiatimeseries2.png")
```
The plot was created by pdf extracting and since that process took time, it was finalized after the article was published.
## Policing the Police
The visualizations here are generic and related to different aspects of 'Policing the Police', for example,
- Corruption/bribery
- Police Shootings/Encounter/Drug Wars/ Causalities
- Trust in Police
This is because, the package publication date was in June, i.e. 2-3 weeks later than the last day of internship. The article drafts would be created in early June and hence, the visualizations are not paired to specific articles for this package.
### Initial Visualizations
#### Visualizations 1 & 2
Data source:
- [*UNODC*](https://dataunodc.un.org/dp-crime-corruption-offences)
```{r data_unodc, echo=FALSE, message=FALSE, warning=FALSE}
corruption_UNODC <-
read_excel("data/policingthepolice/data_cts_corruption_and_economic_crime.xlsx") |>
slice(-c(1, 2)) |>
set_names(
c(
#colnames changed to setnames
"Iso3_code",
"Country",
"Region",
"Subregion",
"Indicator",
"Dimension",
"Category",
"Sex",
"Age",
"Year",
"Unit of measurement",
"VALUE",
"Source"
)
)
corruption_UNODC_clean <-
corruption_UNODC |> #use better data set names
filter(`Unit of measurement` == "Counts") |>
select(Iso3_code, Country, Category, VALUE, Year) |>
filter(Category %in% c("Corruption")) |>
mutate(VALUE = as.numeric(VALUE))
plot_global <- corruption_UNODC_clean |>
group_by(Country) |>
summarise(VALUE = mean(VALUE)) |>
arrange(desc(VALUE)) |>
head(10) |>
mutate(ToHighlight = ifelse(VALUE > 30000, "yes", "no"))
plot_usa <- corruption_UNODC_clean |>
filter(Country == "United States of America") |>
mutate(Year = as.Date(as.character(Year), format = "%Y"),
Year = year(Year))
```
```{r p1, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Viz1: Country Rank: 2013-2021", fig.height=5, fig.width=5}
plot_global |>
ggplot(aes(reorder(x = Country, -VALUE), y = VALUE, fill = ToHighlight)) +
geom_col() +
theme(legend.position = "none") +
theme_dark() +
xlab("Country") +
ylab("Average Corruption Count") +
theme(
plot.title = element_text(face = "bold", size = 18),
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"),
axis.text.x = element_text(angle = 35, face = "bold"),
legend.position = "none"
) +
scale_fill_manual(values = c("yes" = "#57AFD5", "no" = "lightblue"))
```
This plot above ranks the countries in order of no. of corruption cases reported and top 10 countries are shown in the plot, for the years 2013-2021.
```{r p2, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="Viz2: Corruption Count: USA", fig.height=5, fig.width=5}
plot_usa |>
ggplot(aes(x = Year, y = VALUE)) +
geom_col(fill = "lightblue", width = 0.4) +
geom_line(size = 1, colour = "#811B0F") +
ylab("Corruption Count") +
theme_dark() +
theme(
plot.title = element_text(face = "bold", size = 15),
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold")
)
```
In the plot above, USA was a random selection and the plot depics a trend over years of the no. of corruption cases registered.
Reason for REJECTION:
- Even though this is a legit data source, but this gives a count of 'corruption' only and does not compare with the percentage of corruption involving police.
#### Visualization 3 & 4
Data Sources:
- [*Washinton Post*](https://www.washingtonpost.com/graphics/investigations/police-shootings-database/)
- [*Washinton Post Github*](https://github.com/washingtonpost/data-police-shootings/tree/master/v1)
- [*Census data 2010-2020*](https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/counties/totals/)
- [*Census data 2020-2022*](https://www2.census.gov/programs-surveys/popest/datasets/2020-2022/counties/totals/)
- [*Stats/States code data github*](https://github.com/jasonong/List-of-US-States/blob/master/states.csv)
The plot below the rate at which killings occurred in the United States of America over time.
```{r police_data_wrangling, echo=FALSE, message=FALSE, warning=FALSE}
options(scipen = 999)
co_est2020 <- read_csv("data/policingthepolice/co-est2020.csv") |>
select(
-c(
SUMLEV,
REGION,
DIVISION,
STATE,
COUNTY,
CTYNAME,
CENSUS2010POP,
ESTIMATESBASE2010,
POPESTIMATE042020,
POPESTIMATE2011,
POPESTIMATE2012,
POPESTIMATE2013,
POPESTIMATE2014,
POPESTIMATE2010
)
) |>
group_by(STNAME) |>
# summarise(summarise_if(is.numeric, mean))
summarise(
`2015` = mean(POPESTIMATE2015),
`2016` = mean(POPESTIMATE2016),
`2017` = mean(POPESTIMATE2017),
`2018` = mean(POPESTIMATE2018),
`2019` = mean(POPESTIMATE2019),
`2020` = mean(POPESTIMATE2020)
)
co_est2022_alldata <-
read_csv("data/policingthepolice/co-est2022-alldata.csv") |>
select(STNAME, POPESTIMATE2021, POPESTIMATE2022) |>
group_by(STNAME) |>
summarise(`2021` = mean(POPESTIMATE2021),
`2022` = mean(POPESTIMATE2022)) |>
select(-c(STNAME))
bind_pop <- cbind(co_est2020, co_est2022_alldata) |>
pivot_longer(cols = 2:9,
names_to = 'year',
values_to = 'population') |>
rename(state_name = STNAME) |>
mutate(year = as.numeric(year))
states <- read_csv("data/policingthepolice/states.csv") |>
rename(state_name = State,
state = Abbreviation)
fatal_police_shootings_data <-
read_csv("data/policingthepolice/fatal-police-shootings-data.csv") |>
mutate(year = year(date)) |>
select(gender, race, city, state, year)
state_shooting <-
left_join(fatal_police_shootings_data, states, by = "state") |>
select(-c(state, city)) |>
group_by(year, state_name) |>
summarise(count = n())
join_pop_killings <-
full_join(bind_pop, state_shooting, by = c("state_name", "year")) |>
mutate(rate = (count / population) * 100000) |>
select(-c(population, count)) |>
group_by(state_name) |>
summarise(rate = mean(rate)) |>
arrange(desc(rate)) |>
head(10)
#rate per 100,000 population |
join_pop_killings <-
full_join(bind_pop, state_shooting, by = c("state_name", "year")) |>
mutate(rate = (count / population) * 100000) |>
select(-c(population, count))
shots_usa_plot3 <- join_pop_killings |>
group_by(year) |>
na.omit() |>
summarise(avg_rate = mean(rate))
shots_usa_plot4 <- join_pop_killings |>
group_by(state_name) |>
summarise(rate = mean(rate)) |>
arrange(desc(rate)) |>
head(10)
```
```{r p3, echo=FALSE, message=FALSE, warning=FALSE, fig.cap="USA Killings: Rate vs Year", fig.height=5, fig.width=5}
plot4_shootings <- shots_usa_plot3 |>
ggplot(aes(x = year, y = avg_rate)) +
geom_line(color = "#6F0269") +
geom_point() +
geom_vline(xintercept = 2021, linetype = "dashed") +