Big Data Analytics Resume
VirginiA
SUMMARY:
- 7+ Years of professional IT experience including 4 years of Big Data / Hadoop and Big Data analytics.
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Load and transform large sets of structured, semi - structured and unstructured data using Hadoop ecosystem components.
- Having hands on experience in using Hadoop Technologies such as HDFS, HIVE, PIG, SQOOP, Impala, Flume, Spark.
- Having hands on experience in writing Map Reduce jobs in Hive, Pig.
- Having experience on importing and exporting data from different systems to Hadoop file system using SQOOP.
- Having experience on creating databases, tables and views in HIVEQL, IMPALA and PIG LATIN.
- Experience in using different file formats like Avro, ORC, Sequence, CSV, etc.
- Excellent understanding of NoSQL database like Hbase.
- Implemented Proof of Concepts on Hadoop stack and different big data analytic tools, migration from different databases (Oracle, MySQL) to Hadoop.
- Highly experienced Database Administrator: Designing, Data Modeling, Installation, Configuring, Administration, Performance monitoring, Troubleshooting and Fine-tuning of RDBMS (DB2 LUW, SQL Server, Oracle, Paraccel - Matrix), NOSQL (Cassandra, SciDB, RedisDB and Mongo) databases, Graph database Neo4j
- Experience in Entire Hadoop echo system installation and maintenance of the components (HDFS, Hive,) Achieved above and beyond award with Hadoop implementation.
- Experience in Data transfer from using Scoop to/from Hadoop to different data sources (DB2, Oracle etc.)
- Excellent Programming experience in Java, Scala, Python, R, C, C++, Perl, Ksh, Java Script, Pig, Hive, Impala, CSS, NLP
- Excellent skills in Visualization using D3.js, Python, R
- Enterprise level Data Architect: Analysis and Data integration roadmaps, Real time and Batch ETL processing, Interpret business needs into RDBMS and NOSQL conceptual
TECHNICAL SKILLS:
Big Data: Hadoop/Big Data HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume,Oozie, Zookeeper, Spark, Storm, Impala and Kafka.
Programming: R, Python, SQL, Twitter & LinkedIn API, Web scrapping.
Databases: Oracle 10g, IBM DB2, MySQL, SQL Server, SAP RMS.
IDE s/Tools: Tableau, MS Excel Risk Solver, Anaconda, PyCharm, iPython Notebook, Amazon Web Service.
Operating Systems: Linux (Ubuntu, CentOS, Red Hat Linux), Windows XP/7/8/10, OS X 10.11
Version Control: GitHub, SVN.
ANALYTICAL SKILLS: Machine Learning, Data Mining, Sentimental Analysis, Predictive Analytics, Statistical Data Analysis, Optimization, Decision Trees, Sensitivity Analysis, Data Modelling, Data Wrangling, Data Visualization, Cluster Analysis.
PROFESSIONAL EXPERIENCE:
Confidential, Virginia
Big data Analytics
Responsibilities:
- Achieved above and beyond award for successfully implementing Big data/Hadoop.
- Documented the implementation process for Hadoop installation including authentication using Kerberos, Ranger authorizations at policy level, monitoring setup, backups etc.
- Actively involved during data ingestion from DB2 to HADOOP
- Actively involved in Hadoop upgrade project
- Played key role in Paraccel/Matrix implementation including troubleshooting, Leader node HA, Reports automation, automating backups, Boot from SAN conversion etc.
- Machine Learning using Spark ML
- Having experience on using OOZIE to define and schedule the jobs.
- Implemented Spark SQL jobs to read & analyze the data from Hive, write into HDFS/Hive.
- Having experience on Storage and Processing in Hue covering all Hadoop ecosystem components.
- Having Basic experience on using Tableau Reporting Tools.
- Involved in all stages of Software Development Life Cycle.
Environment: Hadoop/Big Data HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume,Oozie, Zookeeper, Spark, Storm, Impala Kafka, Python and SQL.
Confidential, Columbus, GAHadoop Data Analyst/Developer
Responsibilities:
- Involved in end to end data processing like ingestion, processing, quality checks and splitting.
- Bringing the data into Big Data Lake using Pig, Sqoop and Hive.
- Written Map Reduce job for Change Data Capture on HBASE.
- Created Hive ORC and External tables.
- Refined terabytes of data from different sources and created hive tables.
- Developed MapReduce jobs for data cleaning and preprocessing.
- Importing and exporting data into HDFS and HIVE from Oracle, Teradata databases using Sqoop.
- Responsible to manage data coming from different sources.
- Monitoring the running MapReduce jobs on the cluster using Oozie.
- Responsible for loading data from UNIX file systems into HDFS.
- Installed and configured Hive and also wrote Hive UDFs.
- Wrote Pig scripts to process unstructured data and create structure data for use with Hive.
- Written the Oozie workflow to coordinate the Hadoop Jobs.
Environment: Scoop, Pig, Hive, Map Reduce, Java, Oozie, Eclipse, Linux, Oracle, Teradata.
Confidential, NCHadoop Data Analyst
Responsibilities:
- Implemented solutions for ingesting data from various sources and processing the Data utilizing BigData Technologies such as Hive, Pig, Sqoop, Hbase, Mapreduce, etc.
- Design and develop a daily process to do incremental import of raw data from Oracle into Hive tables using Sqoop.
- Experience in querying data from Hbase for lookups, grouping and sorting.
- Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into Hive tables.
- Extensly worked with Partitions, bucketing tables in Hive and designed both Managed and External tables and also worked on optimization of Hive queries.
- Assisted analytics team in writing Pig scripts to perform further detailed analysis of the data.
- Exploring with the Spark for improving the performance and optimization of existing algorithms in Hadoop using Spark Context, SparkSql, Data frames, etc.
Environment: Cloudera CDH 5, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Hbase Spark Context, SparkSql, Data frames.
ConfidentialSoftware Systems Analyst
Responsibilities:
- Conducted strategic IT chain management assessments based on statistical analysis, thereby improved the workflow and efficiency of client’s finance applications.
- Automated the batch recovery process of client, thereby improving the time of recovery by 30%.
- Proposed and implemented several robust workaround techniques, which resulted in an overall decline of customer incidents by 15%.
- Accurately recovered 120K customer records within 2 days during an application malfunction.
- Coordinated with onshore and offshore teams and organized weekly team meetings.
- Performed several root cause analysis of various recurring Enterprise Application related issues.
Environment: Linux (Ubuntu, CentOS, Red Hat Linux), Windows XP/7/8/10, OS X 10.11
