Skip to content
This repository has been archived by the owner on Oct 8, 2019. It is now read-only.

Homepage #1

Open
3 of 17 tasks
arunsrinivasan opened this issue Jun 15, 2014 · 3 comments
Open
3 of 17 tasks

Homepage #1

arunsrinivasan opened this issue Jun 15, 2014 · 3 comments

Comments

@arunsrinivasan
Copy link
Member

Migrated Friday Jun 13, 2014 at 15:51 GMT
Originally opened as Rdatatable/data.table#695


Building tools for easy manipulation of content for later:

  • Custom script for parsing nested <div> elements (likely to evolve as we add content).
  • Doc on using the parser to write custom markdown and perform a first level parsing before running pandoc for second level parsing to convert to html
  • Document on the transition from R-Forge to Github (for future use)
  • Document moving issues from R-Forge to Github (for future use)
  • Makefile to automate + travis-ci build
  • Template markdown file to just add the required content (for Gallery)

Website:

  • Start page design (almost done)
  • Getting started section:
    • About (almost done) - (Updating content from Matt on a bit more on history of data.table)
    • Quick introduction (Waiting for updated content from Matt)
    • Learn by example / frequently performed operations? (NEW) (Current task)
    • FAQ (Waiting for updated content from Matt)
  • Benchmarks
    • base
    • dplyr
    • any other tool?
  • What's new?
    • Specific to current release (with links to older versions)
  • Release notes link to README.md (add #NEWS anchor a bit further down, if possible)
  • Gallery
    • Is most likely to showcase the operations that can be performed with ease. Similar to Dirk's gallery.rcpp.org.
  • How can I contribute?
    • For people interested in filing/quashing bugs, closing features, fixing docs etc.. and for people who'd like to contribute an article on the gallery.

Tentative ideas. Likely to change. More to come.

@arunsrinivasan
Copy link
Member Author

Suggestions to improve the current Quick intro vignette:

  • Interchange Fast Grouping and Keys - now that we promote usage of ad-hoc grouping (with no need for keys).

Timing comparisons here will be quite convincing to the user here. And the only new concept to learn here would be data.table's syntax. And equivalent examples from base allows the user to relate to the type of operation he/she's doing. It's a more natural way to start IMO.

The Keys section, towards the end, talks about Joins, which will be more continuous if Keys is the 2nd section. Joins requiring keys can be established much more nicely here.

  • Joins needs a minimal example (explaining the X[Y] syntax) - and maybe just the mult="first" argument (leaving the rest for the user to discover or move it to the Learn by Example section).

Improvements/Additions for FAQ:

  • Explicitly copy() when assigning column / names(DT) to a variable.
  • Clear up X[Y] usage (if there's room for improvement) as posted here and here - although I do not agree with the attitude of the OP in that post.
  • Explain clearly that data.table, by design modifies by reference - trades off on referential transparency - with much large in-memory data in mind.
  • Explicit by-without-by by=.EACHI.
  • Add explanation for the error message asking for allow.cartesian=TRUE - this comes up too often on SO.
  • Purpose of row.names attribute - recent SO - worth a FAQ?
  • rbindlist can bind by names as well now - this SO question - not sure if it's a FAQ though - Maybe titled "Can rbindlist bind by matching names"?
  • GForce - just the concept and the functions that are optimised for now and what's on the list #523
  • setNumericRounding, it's default value, why this way of dealing with tolerance (briefly).
  • with=FALSE resulting in a data.table even when there's just one column being subset'd - Reason being drop=FALSE is by default (and it's not implemented yet).
  • There's also one under the Documentation tracker: #517 Embellish FAQ 1.6.
  • Remove FAQ 1.6 (or) rename the title and content to avoid confusion for people who take it very literally (like the one on twitter).
  • FAQ on scoping of i should move to beginners FAQ.
  • FAQ that DT[order(.)] by default optimises to internal fast order and that sorts in C-locale. If locale is important they should specify base:::order?? Or maybe, change the syntax to DT[forder(.)] and forder(.) can be used only within [.data.table?? Avoids confusion and FAQ. Tagging [R-Forge #5613] order and base::order give different results data.table#478

Will update if I come across anything else.

@jangorecki
Copy link
Member

Benchmark with python pandas can be good, still the highest voted SO data.table question contains (I believe) heavily outdated benchmark.

@arunsrinivasan
Copy link
Member Author

@jangorecki agreed. If someone's willing to contribute pandas code..

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants