{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Read Data" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", " \n", "#this assumes one json item per line in json file\n", "df=pd.read_json(\"../data/news_category_dataset.json\", lines=True)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "authors object\n", "category object\n", "date datetime64[ns]\n", "headline object\n", "link object\n", "short_description object\n", "dtype: object" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "124989" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#number of rows (datapoints)\n", "len(df)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | authors | \n", "category | \n", "date | \n", "headline | \n", "link | \n", "short_description | \n", "
---|---|---|---|---|---|---|
2317 | \n", "Chris McGonigal | \n", "WORLD NEWS | \n", "2018-04-12 | \n", "Striking Photos Show Israelis Standing Still F... | \n", "https://www.huffingtonpost.com/entry/israelis-... | \n", "Traffic stopped for two minutes. | \n", "
20810 | \n", "Bill Bradley and David Moye | \n", "ENTERTAINMENT | \n", "2017-07-23 | \n", "'Westworld' Season 2 Teaser Suggests Violent D... | \n", "https://www.huffingtonpost.com/entry/westworld... | \n", "This time, the robots are in charge? | \n", "
56479 | \n", "Eliot Nelson | \n", "POLITICS | \n", "2016-06-10 | \n", "Donald Trump Tells Religious Conservatives He'... | \n", "https://www.huffingtonpost.com/entry/donald-tr... | \n", "His somber address was a hit at the Freedom Co... | \n", "
39517 | \n", "Jenna Amatulli | \n", "ARTS & CULTURE | \n", "2016-12-19 | \n", "Sadly, 'Puppy' Isn't Merriam-Webster's Word Of... | \n", "https://www.huffingtonpost.com/entry/sadly-pup... | \n", "“Surreal\" won out over \"puppy,” “flummadiddle,... | \n", "
18634 | \n", "Robert Koehler, ContributorPeace journalist | \n", "WORLD NEWS | \n", "2017-08-17 | \n", "Why Does North Korea Hate Us? | \n", "https://www.huffingtonpost.com/entry/why-does-... | \n", "“The bombing was long, leisurely and merciless... | \n", "
18900 | \n", "East-West Center, ContributorPromoting better ... | \n", "WOMEN | \n", "2017-08-15 | \n", "Asian Teams Participate In FIRST Global Roboti... | \n", "https://www.huffingtonpost.com/entry/asian-tea... | \n", "By Xinxin Zhang, Research Intern, East-West Ce... | \n", "
88345 | \n", "Sebastian Murdock | \n", "WEIRD NEWS | \n", "2015-06-14 | \n", "This Is Not How You Play Frisbee, But We Love ... | \n", "https://www.huffingtonpost.com/entry/bosnian-f... | \n", "\n", " |
62054 | \n", "James Cave | \n", "STYLE | \n", "2016-04-07 | \n", "The Real Story On How Trench Coats Got Their Name | \n", "https://www.huffingtonpost.com/entry/why-are-t... | \n", "Bet you never knew how Burberry struck gold. | \n", "
66784 | \n", "\n", " | SPORTS | \n", "2016-02-12 | \n", "New York Mets Pitcher Jenrry Mejia Permanently... | \n", "https://www.huffingtonpost.comhttp://pubx.co/h... | \n", "New York Mets relief pitcher Jenrry Mejia has ... | \n", "
73038 | \n", "\n", " | SPORTS | \n", "2015-12-03 | \n", "New Wave of Arrests in FIFA Corruption Scandal | \n", "https://www.huffingtonpost.comhttp://www.nytim... | \n", "At least some of the arrests took place at the... | \n", "
52487 | \n", "Michael McLaughlin | \n", "POLITICS | \n", "2016-07-25 | \n", "California City Bans Deceptive Ads By Anti-Abo... | \n", "https://www.huffingtonpost.com/entry/oakland-b... | \n", "Oakland's law targets pregnancy clinics that o... | \n", "
111907 | \n", "Bill Bradley | \n", "ENTERTAINMENT | \n", "2014-09-15 | \n", "Daryl Dixon's Big Secret Is Finally Revealed | \n", "https://www.huffingtonpost.com/entry/walking-d... | \n", "\n", " |
64576 | \n", "Zach Carter and Shahien Nasiripour | \n", "POLITICS | \n", "2016-03-08 | \n", "Elizabeth Warren Is Not On The Ballot And Her ... | \n", "https://www.huffingtonpost.com/entry/elizabeth... | \n", "The progressive movement faces its biggest test. | \n", "
39143 | \n", "Nick Visser | \n", "HEALTHY LIVING | \n", "2016-12-23 | \n", "Scientists Create Effective Ebola Vaccine, Jus... | \n", "https://www.huffingtonpost.com/entry/ebola-vac... | \n", "\"When the next Ebola outbreak hits, we will no... | \n", "
87400 | \n", "Beth Weissenberger, Contributor | \n", "HEALTHY LIVING | \n", "2015-06-25 | \n", "The Soul's Ingredients: The Secret to Summonin... | \n", "https://www.huffingtonpost.com/entry/the-souls... | \n", "\n", " |
32935 | \n", "Rebecca Shapiro | \n", "COMEDY | \n", "2017-03-02 | \n", "Seth Meyers Chews Out The Media For Gushing Ov... | \n", "https://www.huffingtonpost.com/entry/seth-meye... | \n", "\"Guys, seriously, do you have amnesia?\" | \n", "
48774 | \n", "Lee Moran | \n", "COMEDY | \n", "2016-09-05 | \n", "John Oliver Lists The Habits We Should Actuall... | \n", "https://www.huffingtonpost.com/entry/john-oliv... | \n", "Temporarily not wearing white isn't enough for... | \n", "
17443 | \n", "Dave Jamieson | \n", "POLITICS | \n", "2017-09-01 | \n", "Labor Lawyers Blast Trump Administration For P... | \n", "https://www.huffingtonpost.com/entry/hundreds-... | \n", "\"To place him next to the brave men and women ... | \n", "
114621 | \n", "Andy Campbell | \n", "WEIRD NEWS | \n", "2014-08-16 | \n", "Wrong-Way Driver Hits Cyclists, Hides Meth In ... | \n", "https://www.huffingtonpost.com/entry/wrongway-... | \n", "\n", " |
87370 | \n", "Michael Ernest Sweet, ContributorCanadian Writ... | \n", "QUEER VOICES | \n", "2015-06-25 | \n", "The Last Boys: A Book Of Photographs By Barry ... | \n", "https://www.huffingtonpost.com/entry/the-last-... | \n", "\n", " |
119406 | \n", "Sara Eckel, ContributorAuthor, It’s Not You: 2... | \n", "WOMEN | \n", "2014-06-23 | \n", "Are You Hot or Not? The Answer May Change With... | \n", "https://www.huffingtonpost.com/entry/are-you-h... | \n", "We all venture into the dating world hoping th... | \n", "
117313 | \n", "Lisa Copeland, ContributorDating Coach For Wom... | \n", "FIFTY | \n", "2014-07-16 | \n", "Top 20 Dating Tips For Finding Love Again Afte... | \n", "https://www.huffingtonpost.com/entry/dating-ov... | \n", "1. Put in writing what type of relationship yo... | \n", "
95992 | \n", "Janell Burley Hofmann, ContributorAuthor, Spea... | \n", "PARENTS | \n", "2015-03-18 | \n", "5 Tiny Stories From SxSW | \n", "https://www.huffingtonpost.com/entry/5-tiny-st... | \n", "You told the panel that they changed your life... | \n", "
3400 | \n", "Mary Papenfuss | \n", "ENTERTAINMENT | \n", "2018-03-23 | \n", "Bill Murray Compares Parkland Teens To Vietnam... | \n", "https://www.huffingtonpost.com/entry/parkland-... | \n", "When your idealism isn't \"broken yet,\" you spe... | \n", "
96289 | \n", "Ron Dicker | \n", "SCIENCE | \n", "2015-03-14 | \n", "You Might As Well Flip A Coin To Fill In Your ... | \n", "https://www.huffingtonpost.com/entry/flip-coin... | \n", "\n", " |
91428 | \n", "China Hands, ContributorFor Future Leaders in ... | \n", "POLITICS | \n", "2015-05-10 | \n", "Beyond the Gaokao: How Chinese Students Earn T... | \n", "https://www.huffingtonpost.com/entry/beyond-th... | \n", "Qi recounts three stories of students making t... | \n", "
29215 | \n", "Mary Papenfuss | \n", "SPORTS | \n", "2017-04-15 | \n", "NFL's Todd Heap Runs Over Daughter In Deadly A... | \n", "https://www.huffingtonpost.com/entry/todd-heap... | \n", "The tragedy occurred in driveway of his Arizon... | \n", "
49611 | \n", "Michael McLaughlin | \n", "POLITICS | \n", "2016-08-26 | \n", "Trump And Clinton Supporters Find Common Groun... | \n", "https://www.huffingtonpost.com/entry/trump-and... | \n", "But on other gun policy questions, the gaps ke... | \n", "
53475 | \n", "Leigh Blickley | \n", "ENTERTAINMENT | \n", "2016-07-14 | \n", "'Game Of Thrones' Stars React To Their Emmy No... | \n", "https://www.huffingtonpost.com/entry/game-of-t... | \n", "So many (23, to be exact) noms for this amazin... | \n", "
79857 | \n", "James Michael Nichols | \n", "QUEER VOICES | \n", "2015-09-18 | \n", "Seasonal Queer Coming-Of-Consciousness Party '... | \n", "https://www.huffingtonpost.com/entry/psychic-s... | \n", "\"We don’t impose rules on people -- the vibe i... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
72907 | \n", "\n", " | CRIME | \n", "2015-12-04 | \n", "California Rampage Shocks Those Who Knew Shooters | \n", "https://www.huffingtonpost.com/entry/san-berna... | \n", "\"This was a person who was successful, who had... | \n", "
102621 | \n", "Nick Visser | \n", "GREEN | \n", "2014-12-31 | \n", "The 10 Best Photos From The Dept. Of The Inter... | \n", "https://www.huffingtonpost.com/entry/departmen... | \n", "\n", " |
108212 | \n", "Cynthia Dagnal-Myron, ContributorReporter, Aut... | \n", "ENTERTAINMENT | \n", "2014-10-28 | \n", "Jack Bruce: A Fond Fan Farewell | \n", "https://www.huffingtonpost.com/entry/jack-bruc... | \n", "Clapton was God to some--or so the graffiti sa... | \n", "
94707 | \n", "Kevin Price, ContributorPublisher and Editor i... | \n", "POLITICS | \n", "2015-04-02 | \n", "Federal Laws: Too Numerous and Vague | \n", "https://www.huffingtonpost.com/entry/federal-l... | \n", "The vast majority of Americans clearly seem un... | \n", "
54325 | \n", "Sarah Guerrero, ContributorWriter & Stay at Ho... | \n", "WOMEN | \n", "2016-07-04 | \n", "I Had An Early Miscarriage | \n", "https://www.huffingtonpost.com/entry/i-had-an-... | \n", "Grief sneaks out during tired moments. | \n", "
21837 | \n", "Cavan Sieczkowski | \n", "ENTERTAINMENT | \n", "2017-07-11 | \n", "Ryan Reynolds Gave The Best Nod To The Glory O... | \n", "https://www.huffingtonpost.com/entry/ryan-reyn... | \n", "Bow down. | \n", "
102087 | \n", "Mark DeCarlo, Contributor3 time Emmy Award win... | \n", "HEALTHY LIVING | \n", "2015-01-06 | \n", "Michael Symon: Cleveland's Real King | \n", "https://www.huffingtonpost.com/entry/michael-s... | \n", "\n", " |
101670 | \n", "\n", " | RELIGION | \n", "2015-01-11 | \n", "Police Chief To Black Churches: 'We Can't Do T... | \n", "https://www.huffingtonpost.com/entry/police-bl... | \n", "\n", " |
44538 | \n", "Nina Golgowski | \n", "LATINO VOICES | \n", "2016-10-23 | \n", "Eric Trump Poses With Woman Wearing 'Latina Ag... | \n", "https://www.huffingtonpost.com/entry/eric-trum... | \n", "\"It stands as a testament to the lack of diver... | \n", "
50770 | \n", "Cole Delbyck | \n", "ENTERTAINMENT | \n", "2016-08-13 | \n", "Kenny Baker, The Actor Who Played R2-D2 In 'St... | \n", "https://www.huffingtonpost.com/entry/kenny-bak... | \n", "Sad beep. | \n", "
50828 | \n", "Cristian Farias | \n", "POLITICS | \n", "2016-08-12 | \n", "Federal Judges Can't Clear Someone's Record, E... | \n", "https://www.huffingtonpost.com/entry/federal-j... | \n", "A court said they are without authority to wip... | \n", "
109132 | \n", "Ayala Laufer-Cahana, M.D., ContributorPhysicia... | \n", "HEALTHY LIVING | \n", "2014-10-17 | \n", "Brain Zapping for Weight Loss | \n", "https://www.huffingtonpost.com/entry/brain-zap... | \n", "\n", " |
65178 | \n", "Daniel Marans | \n", "POLITICS | \n", "2016-03-02 | \n", "Mainstream Republicans Are Unsure How To Stop ... | \n", "https://www.huffingtonpost.com/entry/republica... | \n", "Others are resigned to his victory -- and are ... | \n", "
17870 | \n", "Emma Gray | \n", "WOMEN | \n", "2017-08-28 | \n", "Trump Confuses Two Female Finnish Journalists ... | \n", "https://www.huffingtonpost.com/entry/trump-con... | \n", "The Finnish president had to explain that they... | \n", "
50296 | \n", "Natalie Jackson and Ariel Edwards-Levy | \n", "POLITICS | \n", "2016-08-18 | \n", "HUFFPOLLSTER: Without Donald Trump, Republican... | \n", "https://www.huffingtonpost.com/entry/without-d... | \n", "The Republican nominee should be polling bette... | \n", "
2924 | \n", "Andy McDonald | \n", "ENTERTAINMENT | \n", "2018-04-02 | \n", "'Broad City' Co-Creator Developing 'League Of ... | \n", "https://www.huffingtonpost.com/entry/league-of... | \n", "The project reportedly received the blessing o... | \n", "
80415 | \n", "Vishavjit Singh, ContributorEditorial cartoonist | \n", "RELIGION | \n", "2015-09-12 | \n", "The Sikh Boy Who Developed Breasts And Grew Up... | \n", "https://www.huffingtonpost.com/entry/the-sikh-... | \n", "In my early childhood photos I look like a hea... | \n", "
16535 | \n", "Michael Tesler, ContributorAssociate Professor... | \n", "POLITICS | \n", "2017-09-14 | \n", "Jemele Hill’s The Mainstream: Most Americans T... | \n", "https://www.huffingtonpost.com/entry/jemele-hi... | \n", "Jemele Hill, the ESPN host of SportsCenter’s “... | \n", "
93889 | \n", "Ron Dicker | \n", "LATINO VOICES | \n", "2015-04-11 | \n", "Zoe Saldana Opens Up On Facebook About Post-Bi... | \n", "https://www.huffingtonpost.com/entry/zoe-salda... | \n", "\n", " |
119723 | \n", "Eric J. Hall, ContributorPresident & CEO of He... | \n", "HEALTHY LIVING | \n", "2014-06-19 | \n", "Many Doctors Don't Support Life Support When ... | \n", "https://www.huffingtonpost.com/entry/death-and... | \n", "A recent study of more than a thousand doctors... | \n", "
3827 | \n", "Curtis M. Wong | \n", "QUEER VOICES | \n", "2018-03-16 | \n", "Newspaper Scraps References To Gay Man's Husba... | \n", "https://www.huffingtonpost.com/entry/texas-new... | \n", "The publisher of Texas' Olton Enterprise said ... | \n", "
14693 | \n", "Alexander C. Kaufman | \n", "POLITICS | \n", "2017-10-07 | \n", "Pittsburgh’s Mayor Calls For ‘An American Mars... | \n", "https://www.huffingtonpost.com/entry/peduto-tr... | \n", "This could be the key for Democrats to win bac... | \n", "
58342 | \n", "Maddie Crum | \n", "ARTS & CULTURE | \n", "2016-05-19 | \n", "A Visual Survey Of Retro Computers That Predat... | \n", "https://www.huffingtonpost.com/entry/retro-com... | \n", "Elegant technology has a long history. | \n", "
124933 | \n", "David Finkle, ContributorWriter, Drama Critic | \n", "ARTS | \n", "2014-04-18 | \n", "First Nighter: Moss Hart's \"Act One\" in Two Gr... | \n", "https://www.huffingtonpost.com/entry/first-nig... | \n", "\n", " |
19126 | \n", "Sebastian Murdock | \n", "CRIME | \n", "2017-08-12 | \n", "'Beautiful Moment Ripped Away' As Car Plows In... | \n", "https://www.huffingtonpost.com/entry/during-ra... | \n", "\"These terrorists aren't trolls -- they're ter... | \n", "
57926 | \n", "\n", " | GOOD NEWS | \n", "2016-05-24 | \n", "Paralyzed Dog Was About To Be Put Down When So... | \n", "https://www.huffingtonpost.comhttp://pubx.co/o... | \n", "Ollie, a 10-year-old Shetland sheepdog (aka Sh... | \n", "
86604 | \n", "Dragos Bratasanu, ContributorMake your dream a... | \n", "HEALTHY LIVING | \n", "2015-07-03 | \n", "How To Create Giant Success (And Live A Fulfil... | \n", "https://www.huffingtonpost.com/entry/7-steps-t... | \n", "There is a series of mental events that lead t... | \n", "
36765 | \n", "Ron Dicker | \n", "ENTERTAINMENT | \n", "2017-01-19 | \n", "Woman Waving Palestinian Flag Accosts Bella Ha... | \n", "https://www.huffingtonpost.com/entry/woman-wav... | \n", "Police say a harassment report has been filed. | \n", "
56098 | \n", "Minou Clark, The Huffington Post | \n", "COMEDY | \n", "2016-06-14 | \n", "23 Things You'll Only Understand If You Still ... | \n", "https://www.huffingtonpost.com/entry/yup-still... | \n", "There's no place like home, right? | \n", "
52090 | \n", "Kate Sheppard | \n", "POLITICS | \n", "2016-07-29 | \n", "Bill And Tim (And Hillary And Barack's) Excell... | \n", "https://www.huffingtonpost.com/entry/photos-ba... | \n", "Behind the scenes at the Democratic National C... | \n", "
100 rows × 6 columns
\n", "\n", " | text_fields | \n", "feature_representation | \n", "top_k | \n", "accuracy | \n", "mrr_at_k | \n", "
---|---|---|---|---|---|
8 | \n", "text_desc_headline_url | \n", "tfidf | \n", "3 | \n", "0.867256 | \n", "0.751152 | \n", "
6 | \n", "text_desc_headline_url | \n", "binary | \n", "3 | \n", "0.830165 | \n", "0.715587 | \n", "
7 | \n", "text_desc_headline_url | \n", "counts | \n", "3 | \n", "0.829653 | \n", "0.718131 | \n", "
5 | \n", "text_desc_headline | \n", "tfidf | \n", "3 | \n", "0.835925 | \n", "0.717171 | \n", "
3 | \n", "text_desc_headline | \n", "binary | \n", "3 | \n", "0.794675 | \n", "0.679169 | \n", "
4 | \n", "text_desc_headline | \n", "counts | \n", "3 | \n", "0.792179 | \n", "0.677894 | \n", "
2 | \n", "text_desc | \n", "tfidf | \n", "3 | \n", "0.630632 | \n", "0.510838 | \n", "
0 | \n", "text_desc | \n", "binary | \n", "3 | \n", "0.598054 | \n", "0.480489 | \n", "
1 | \n", "text_desc | \n", "counts | \n", "3 | \n", "0.595526 | \n", "0.478436 | \n", "