Spacy - v3.6.0


✨ New features and improvements

  • NEW: span_finder pipeline component to identify overlapping, unlabeled spans (#12507).
  • Language updates:
    • Add initial support for Malay (#12602).
    • Update Latin defaults to support noun chunks, update lexical/tokenizer defaults and add example sentences (#12538).
  • Add option to return scores separately keyed by component name with spacy evaluate --per-component, Language.evaluate(per_component=True) and Scorer.score(per_component=True) (#12540).
  • Support custom token/lexeme attribute for vectors (#12625).
  • Support spancat_singlelabel in spacy debug data CLI (#12749).
  • Typing updates for PhraseMatcher and SpanGroup (#12642, #12714).

🔴 Bug fixes

  • 12569: Require that all SpanGroup spans come from the current doc.

📦 Trained pipelines updates

We have added new pipelines for Slovenian that use the trainable lemmatizer and floret vectors.

| Package | UPOS | Parser LAS | NER F |
| --- | --- | --- | --- |
| sl_core_news_sm | 96.9 | 82.1 | 62.9 |
| sl_core_news_md | 97.6 | 84.3 | 73.5 |
| sl_core_news_lg | 97.7 | 84.3 | 79.0 |
| sl_core_news_trf | 99.0 | 91.7 | 90.0 |

  • 🙏 Special thanks to @orglce for help with the new pipelines!

The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize "get" as a passive auxiliary.

The Danish pipeline da_core_news_trf has been updated to use vesteinn/DanskBERT with performance improvements across the board.

⚠️ Backwards incompatibilities

  • SpanGroup spans are now required to be from the same doc. When initializing a SpanGroup, there is a new check to verify that all added spans refer to the current doc. Without this check, it was possible to run into string store or other errors.

📖 Documentation and examples

👥 Contributors

@adrianeboyd, @bdura, @danieldk, @davidberenstein1957, @diyclassics, @essenmitsosse, @honnibal, @ines, @isabelizimm, @jmyerston, @kadarakos, @KennethEnevoldsen, @khursani8, @ljvmiranda921, @rmitsch, @shadeMe, @svlandeg, @tomaarsen, @victorialslocum, @vin-ivar, @ZiadAmerr


Details

date
July 7, 2023, 8:32 a.m.
name
v3.6.0: New span finder component and pipelines for Slovenian
type
Minor
👇
Register or login to:
  • 🔍View and search all Spacy releases.
  • 🛠️Create and share lists to track your tools.
  • 🚨Setup notifications for major, security, feature or patch updates.
  • 🚀Much more coming soon!
Continue with GitHub
Continue with Google
or