modified some code and new markdown file with notes

2025-09-20 12:24:39 -04:00 · 2023-03-27 18:35:05 -04:00 · 2023-03-27 18:35:05 -04:00 · 21283b1ebb
commit 21283b1ebb
parent b5373e4b4b
2 changed files with 8 additions and 1 deletions
--- a/pythonCode/main.py
+++ b/pythonCode/main.py
@ -2,7 +2,7 @@ import spacy
 from collections import Counter
 import re as regex
 import os
-# Uncomment this line if you need the language model.
+# nlp = spacy.cli.download("en_core_web_lg")
 # If you already have it, comment it ou.
 # Let's try the different spaCy language models for this. We can compare _lg with _md or _sm
 workingDir = os.getcwd()
@ -33,6 +33,7 @@ def entitycollector(tokens):
            with open("outputNames.txt", 'a') as f:
                f.write("\n" + entity.text)
                print("Writing in outputNames.txt: " + entity.text)
+            ## Below includes entity values and stuf
            # print(entity.text, entity.label_, spacy.explain(entity.label_))
            entities.append(entity.text)
        return entities
--- a/pythonCode/read.md
+++ b/pythonCode/read.md
@ -0,0 +1,6 @@
+# PythonCode 
+## Notes
+- You will need to uncomment the ```nlp = spacy.cli.download("en_core_web_lg")``` to download the language model stuff.  u can uncomment it once it downloads. 
+- The ```re``` library includes our Regex functions. It is called using ```regex```. It uses standard regular expression stuff. 
+- Everytime ```main.py``` launches, ```outputNames.txt``` clears. It will need to go through the entirety of our files, which still has to be done. Will all of the files work??? 
+- We will need to modify code so that it can produce new ```.xml``` files. Probably best to output files in new directory or something once we get started on that.