textfiles-politics/docs/methods.html

103 lines
7.1 KiB
HTML
Raw Normal View History

2023-04-24 14:59:26 -04:00
<!DOCTYPE html>
<html>
<head>
2023-05-03 17:16:32 -04:00
<title>Politics-Conspiracies (Methods/Reflection)</title>
2023-04-24 14:59:26 -04:00
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" href="CSSstyle.css">
</head>
2023-05-03 17:16:32 -04:00
<body>
<h1 id="title-index">Politics-Conspiracies (Methods/Reflection)</h1>
2023-04-24 14:59:26 -04:00
<nav id="menu">
2023-05-03 17:16:32 -04:00
<a href="index.html">
<div class="button">Home</div>
</a>
<a href="conspiracies.html">
<div class="button">Conspiracies</div>
</a>
<a href="analysis.html">
<div class="button">Analysis</div>
</a>
<a href="methods.html">
<div class="button">Reflection/Methods</div>
</a>
<a href="about.html">
<div class="button">About</div>
</a>
<a href="https://github.com/nhammer514/textfiles-politics">
<div class="button">GitHub <img alt="github icon"
src="https://logos-download.com/wp-content/uploads/2016/09/GitHub_logo.png"
width="15" /></div>
</a>
2023-04-24 14:59:26 -04:00
</nav>
2023-05-03 17:16:32 -04:00
<div class="content1">
<h2 id="Analysis-title" style="text-align: center;">Reflection</h2>
<p>&emsp;Building a website about conspiracy theories using <b>XML</b>, <b>XQuery</b>, and <b>Python</b> was
a challenging but rewarding project. The use of <b>XML</b> and <b>XQuery</b> provided a structured
and flexible way to organize the website's content, making it easier to manage and
update. <b>Python</b>, on the other hand, helped automate tasks and streamline the
website's functionality. However, one of the most significant challenges was
ensuring that the website's content was balanced and presented in an objective
manner, especially given the sensitive nature of conspiracy theories. Overall, the
project required a combination of technical skills, critical thinking, and
sensitivity to ensure that the final product was informative and thought-provoking
without promoting any specific conspiracy theory.</p>
<p>&emsp;In addition to <b>XML</b>, <b>XQuery</b>, and <b>Python</b>, incorporating Spacy and Cytoscape into
the project added another layer of complexity and functionality. Spacy, a natural
language processing library, allowed for more advanced analysis of the website's
content and provided the ability to extract and analyze data from unstructured text.
Cytoscape, a network analysis and visualization tool, provided a way to visualize
relationships between various conspiracy theories and the people, organizations, or
events they involve. The integration of these tools into the website not only
enhanced its functionality but also improved its overall user experience by
presenting the content in a more interactive and engaging way. However, it also
required a greater understanding of the technical aspects of these tools, which
added an extra level of complexity to the project. Overall, the use of Spacy and
Cytoscape showcased the possibilities of incorporating cutting-edge technologies
into website development and demonstrated the importance of staying up-to-date with
emerging technologies. </p>
<p>&emsp;Building a website using <b>HTML</b> and <b>CSS</b> comes with its own set of challenges. One
of the biggest challenges is ensuring that the website is responsive and looks good
on different devices with different screen sizes. This involves writing code that
can adapt to different screen sizes and resolutions, which can be time-consuming and
requires attention to detail. This also allowed us to display all our work in one
place for anyone to see and experience. The html allowed us to give the website
structure, while css allowed us to express ourselves with the styling of the
website.</p>
<p> One of the main challenges behind the scenes was the text files. We had to grab any viable documents from
textfiles.com, and wrap any nodes that referenced any entity. To solve this, we
pushed all the files through a pipeline of techniques and scripts.</p>
<p> Once we scrapped the usable conspiracies from textfiles.com, we had to replace all
characters that would be incompatible with the .xml or .html file format. When all
special characters were replaced, it was time to move onto wrapping paragraphs or
any loose strings of text into &lt;p&gt; elements, in order for the texts to be
legible on the website. Once we had each file has been appropriately wrapped and
checked for any errors, the next step was pushing each file into a <b>Python</b> script
that uses Natural Language Processing, RegeX, and SpaCey to wrap any found entities.
Unfortunately, the script's work was not perfect. There were issues with overlapping
tags, and many names were not wrapped at all. The script would have go through each
file again, and check for any invalid wrapping. If all the files were valid, and
lacked any errors, it was time to use <b>xQuery</b> and <b>XSLT</b> to turn the files into pages
that can be viewed from a browser. </p>
<p> After all of that, text files are viewable on the website and would be wrapped accordingly to what entities were referenced. But there was still more to be done in regards to the vizualization of the text files and their entities. Using <b>xQuery</b>, we were able to generate a .tsv file consists of a table, counting how many times an entity was referenced and what file said entity is referenced from. Then, we throw the .tsv file into CytoScape, and generate a vast network of all of the interconnecting nodes.</p>
<hr/>
<h2 style="text-align: center;"> Software/Methods Used </h2>
<ol>
<li> Python/PyCharm</li>
<li> XML/oXygen text editor</li>
<li> CytoScape and AntConc</li>
<li> eXide, xQuery</li>
<li> XSLT</li>
</ol>
</div>
<hr />
<p> Very special thanks to <a href="https://textfiles.com">TextFiles.com</a> for their awesome directory of text files! They were the foundation of this project, and this would of fell apart without them.</p>
<h2 id="copyright" style="margin:auto;">
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">
<img alt="GNU Public License e" style="border-width:0"
src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" />
</a>
<br />
2023-04-25 14:00:16 -04:00
</h2>
2023-04-24 14:59:26 -04:00
</body>
2023-05-03 17:16:32 -04:00
</html>