کتاب :: پردازش زبان طبیعی

۸ مطلب با کلمه‌ی کلیدی «کتاب» ثبت شده است

کتاب شیرجه در یادگیری عمیق

این کتاب به صورت متن‌باز منتشر شده است همراه با مثال‌هایی از پردازش زبان و تصویر.

۲۱ اسفند ۹۷ ، ۱۹:۵۱ ۰ نظر

محمدصادق رسولی

احتمالاً در ایران خبر خاصی محسوب نمی‌شود. یادم است ۸ سال پیش هم این کتاب در دسترس بود. حتی کتاب‌فروشی دانشگاه شریف کپی بی‌کیفیتش را می‌فروخت. در هر صورت، این کتاب کتاب مرجع خیلی از دانشگاه‌ها بوده است و اخیراً به صورت رایگان منتشر شده است.

https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/?OCID=msr_ebook_cbishop_tw

کتاب دیگری نیز در این زمینه است که خیلی جدیدتر و متن‌باز است:

https://mml-book.github.io/

۰۶ آذر ۹۷ ، ۲۱:۴۷ ۰ نظر

محمدصادق رسولی

کتاب‌های جدید از جمله «شبکهٔ عصبی در پردازش زبان طبیعی»

در این مدتی که مطلبی ننوشتم، کتاب‌های زیادی در انتشارات مرگان و کلی‌پول منتشر شده است. مهمترین آنها کتاب «روش‌های شبکه‌های عصبی در پردازش زبان طبیعی»‌ است:

Neural Network Methods for Natural Language Processing

Neural networks are a family of powerful machine learning models. This book focuses on the application of neural network models to natural language data. The first half of the book (Parts I and II) covers the basics of supervised machine learning and feed-forward neural networks, the basics of working with machine learning over language data, and the use of vector-based rather than symbolic representations for words. It also covers the computation-graph abstraction, which allows to easily define and train arbitrary neural networks, and is the basis behind the design of contemporary neural network software libraries.

The second part of the book (Parts III and IV) introduces more specialized neural network architectures, including 1D convolutional neural networks, recurrent neural networks, conditioned-generation models, and attention-based models. These architectures and techniques are the driving force behind state-of-the-art algorithms for machine translation, syntactic parsing, and many other applications. Finally, we also discuss tree-shaped networks, structured prediction, and the prospects of multi-task learning.

کتاب «یادگیری عمیق» هم به نهایی شده است و نسخهٔ رایگانش در دسترس است: http://www.deeplearningbook.org/

۰۷ ارديبهشت ۹۶ ، ۲۱:۱۷ ۰ نظر

محمدصادق رسولی

کتاب: تحلیل بیزی در پردازش زبان طبیعی

این کتاب را به تازگی انتشارات مرگان و کلی‌پول منتشر کرده است. ۳ سال پیش پیش‌نویس این کتاب به عنوان جزوهٔ یکی از درس‌های ما بود با همین عنوان. با وجود ابهام در بعضی بخش‌ها، این کتاب به نظرم الان جامع‌ترین مرجع برای این موضوع است (اگر به کتاب دسترسی ندارید به بنده ایمیل بفرستید)

دریافت کتاب

Bayesian Analysis in Natural Language Processing

Shay Cohen (University of Edinburgh)

Natural language processing (NLP) went through a profound transformation in the mid-1980s when it shifted to make heavy use of corpora and data-driven techniques to analyze language. Since then, the use of statistical techniques in NLP has evolved in several ways. One such example of evolution took place in the late 1990s or early 2000s, when full-fledged Bayesian machinery was introduced to NLP. This Bayesian approach to NLP has come to accommodate for various shortcomings in the frequentist approach and to enrich it, especially in the unsupervised setting, where statistical learning is done without target prediction examples.

We cover the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area. These methods and algorithms are partially borrowed from both machine learning and statistics and are partially developed "in-house" in NLP. We cover inference techniques such as Markov chain Monte Carlo sampling and variational inference, Bayesian estimation, and nonparametric modeling. We also cover fundamental concepts in Bayesian statistics such as prior distributions, conjugacy, and generative modeling. Finally, we cover some of the fundamental modeling techniques in NLP, such as grammar modeling and their use with Bayesian analysis.

۲۶ خرداد ۹۵ ، ۰۰:۴۲ ۰ نظر

محمدصادق رسولی

کتاب:‌ استنتاج دستور زبان در زبان‌شناسی رایانه‌ای

این کتاب به تازگی از انتشارات مرگان و کلی‌پول منتشر شده است.

Grammatical Inference for Computational Linguistics
Jeffrey Heinz, Colin de la Higuera, Menno van Zaanen

Abstract
This book provides a thorough introduction to the subfield of theoretical computer science known as grammatical inference from a computational linguistic perspective. Grammatical inference provides principled methods for developing computationally sound algorithms that learn structure from strings of symbols. The relationship to computational linguistics is natural because many research problems in computational linguistics are learning problems on words, phrases, and sentences: What algorithm can take as input some finite amount of data (for instance a corpus, annotated or otherwise) and output a system that behaves "correctly" on specific tasks?
Throughout the text, the key concepts of grammatical inference are interleaved with illustrative examples drawn from problems in computational linguistics. Special attention is paid to the notion of "learning bias." In the context of computational linguistics, such bias can be thought to reflect common (ideally universal) properties of natural languages. This bias can be incorporated either by identifying a learnable class of languages which contains the language to be learned or by using particular strategies for optimizing parameter values. Examples are drawn largely from two linguistic domains (phonology and syntax) which span major regions of the Chomsky Hierarchy (from regular to context-sensitive classes). The conclusion summarizes the major lessons and open questions that grammatical inference brings to computational linguistics.

Table of Contents: List of Figures / List of Tables / Preface / Studying Learning / Formal Learning / Learning Regular Languages / Learning Non-Regular Languages / Lessons Learned and Open Problems / Bibliography / Author Biographies

اگر به این کتاب دسترسی ندارید، به بنده ایمیل بفرستید.

۲۰ آبان ۹۴ ، ۰۰:۵۴ ۰ نظر

محمدصادق رسولی

مقالات و کتاب‌های جدید

باز هم بلاگفا دچار مشکلات عجیب و غریب شد و اصلاً معلوم نیست با این اوضاع بشود در این محیط ادامه داد.

پس از غیبتی نسبتاً طولانی با چند مطلب نسبتاً بی‌ربط وبلاگ را به‌روز می‌کنم.

****

مقالهٔ اخیرم در مورد یادگیری تجزیه‌گر نحوی بدون داشتن دادگان درختی و با استفاده از داده‌های ترجمه در همایش EMNLP 2015 منتشر شده است:

Mohammad Sadegh Rasooli and Michael Collins. Density-Driven Cross-Lingual Transfer of Dependency Parsers. Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 328–338, Lisboa, Portugal, September 2015. [Slides]

****

اخیراً انتشارات مرگان کلی‌پول کتاب‌های متنوعی را در مورد پردازش زبان طبیعی منتشر کرده است: کتاب‌های «پردازش زبان طبیعی در رسانه‌های اجتماعی» و «شناخت خودکار فریب کلامی».

****

یکی از روش‌های پرطرفدار در یکی دو سال اخیر، «یادگیری عمیق» با استفاده از شبکه‌های عصبی است. برای علاقه‌مندان به این موضوع این کتاب طولانی و تخصصی پیشنهاد می‌شود. البته این کتاب خیلی تخصصی است و شاید این مقاله خیلی کاربردی‌تر و ساده‌تر باشد:

Yoav Goldberg, A Primer on Neural Network Models for Natural Language Processing. Arxiv preprints, Oct. 2015.

۱۵ مهر ۹۴ ، ۰۱:۲۲ ۰ نظر

محمدصادق رسولی

همایش‌های اخیر و مقاله‌ای در پردازش زبان فارسی

سلام،

بعد از مدت طولانی خراب شدن بلاگفا و پاک شدن پاره‌ای از اطلاعات باید دوباره مطلب بگذارم.

در این مدت اتفاقات زیادی در حوزهٔ‌ پردازش زبان افتاده است: NAACL 2015 و ACL-IJCNLP 2015 برگزار شدند و EMNLP 2015 به زودی برگزار خواهد شد. ویدئو ارائه‌های NAACL 2015 از این پیوند قابل دریافت است.

کتاب‌های مرگان کلی‌پول هم به روز شده‌اند و کتابی در مورد شباهت معنایی به تازگی منتشر شده است.

ما هم به تازگی مقاله‌ای در ACL-IJCNLP در مورد ساخت اضافه در زبان فارسی داشتیم که پیشنهاد می‌کنم اگر به موضوع پردازش زبان فارسی علاقه‌مندید مطالعه کنید:

Alireza Nourian, Mohammad Sadegh Rasooli, Mohsen Imany, and Heshaam Faili. On the Importance of Ezafe Construction in Persian Parsing. The 53rd Annual Meeting of the Association for Computational Linguistics (ACL) and the 7h International Joint Conference on Natural Language Processing (IJCNLP), Beijing, China, July 2015.

همچنین مقاله‌ای را برای EMNLP در مورد تجزیهٔ وابستگی با استفاده از داده‌های دوزبانه در دست انتشار داریم که ان‌شاءالله به زودی در وبلاگ خواهم گذاشت.

۱۲ مرداد ۹۴ ، ۰۲:۵۱ ۳ نظر

محمدصادق رسولی

پیش‌بینی ساخت‌های زبانی

این جلسات به عنوان دورۀ فشردۀ پردازش زبانی بر اساس برخی از مباحث کتاب «پیش‌بینی ساخت‌های زبانی» نوشتۀ نوح اسمیت (2011) آماده و در جلسات فنی مرکز تحقیقات کامپیوتری علوم اسلامی ارائه شده است.