We provide IT Staff Augmentation Services!

Hadoop Analytics Developer Resume

0/5 (Submit Your Rating)

SUMMARY

  • Over 5+ Years IT experience including 4 years of experience with Big Data, Hadoop and its eco system components.
  • 2+ years of experience in BI/Data Analytics including Tableau, Python, SSRS, R, Statistical Algorithms like Linear & logistic regression, KNN, SVM, Random forest, K - means, etc...
  • Hands-on experience of Big Data Development using technologies such as HDFS, MapReduce, Spark, Pig, Hive, HBase, Sqoop and Flume.
  • Experience in processing structured, Semi-Structured and UnstructuredDatausing different tools and frameworks in Hadoop Ecosystem.
  • Experienced in working on ingesting data from RDBMS systems to Hadoop using Sqoop.
  • Expertise in developing Hadoop Architecture and various components such as HDFS, Name Node, Data Node, and MapReduce/Yarn.
  • Experience in using commercial Hadoop Distributions, Hortonworks Data Platform (HDP) and Cloudera Distribution including Hadoop (CDH).
  • Good experience in using Big data components on Azure Cloud.
  • Scheduling, monitoring job workflows and identifying failures with Oozie and integrating jobs withZookeeper.
  • Experienced in working on various methods and custom tools to validate the ingested data.
  • Expertise in working on HIVE data store storage formats (AVRO, Parquet, Sequence files) and performance issues.
  • Working experience on ingesting and querying data in and out of HBase.
  • Good knowledge in implementing various data processing techniques using Pig and MapReduce for handling the data and formatting it as required.
  • Experienced in working on YARN resource scheduler to control resource allocation for various groups in organization.
  • Have good knowledge on NoSQL databases like HBase, Cassandra.
  • Used different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.
  • Proficient in using OOPS and Java concepts such as Multi-threading and collections necessary for writing Map Reduce jobs.
  • Very good experience in working on UNIX commands and shell scripting.
  • Experience with code development frameworks - GitHub, BitBucket.
  • Good knowledge and experience in Python.
  • Expertise in writing SQL Queries, SSIS, SSRS, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
  • Flexible and versatile to adapt to any new environment.
  • Team and Independent Player with excellent Organizational and Interpersonal skills.

TECHNICAL SKILLS

Hadoop / Big Data: HDFS, MapReduce, YARN, Hive, Hbase, Pig, Sqoop, Flume, Oozie, Tez, Zookeeper, Kafka, Solr, Spark Streaming, Avro, RC, ORC, Parquet and Ambari

NoSQL Databases: HBase, Cassandra

Languages: Java, Scala, Python

Scripting/Query: Shell Scripting, SQL, R

IDEs: Eclipse, Spring Tool Suite, Intellij IDEA

Version Control: GIT, SVN

Databases: Oracle 10g/9i/8i, DB2, MySQL

BI/Visualization: Tableau, SSRS

Operating Systems: Windows, Linux, UNIX family, MAC OSX

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Analytics Developer

Responsibilities:

  • Worked on deploying data from Relational database management system (RDBMS) to Hadoop Distributed File System (HDFS) with the Hive Architecture.
  • Used Hortonworks Data Platform on Azure cloud.
  • Worked on SQOOP to ingest data from various relational data sources.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
  • Worked with NoSQL database Hbase to store cleansed data.
  • Design, build and support pipelines ofdataingestion, transformation, conversion and validation.
  • Worked on design on Hive data store to store the data from various data sources.
  • Installed and configured Hive and also written Hive UDFs.
  • Extensively worked on HIVE data stores for text, Avro and RC storage formats.
  • Worked on performance tuning of HIVE queries with partitioning and bucketing process.
  • Participated in design of Flume source/sink module usage pattern for data ingestion.
  • Worked on Spark to retrieve data by using Scala.
  • Involved in creating Hive External tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Worked on performing data standardization using PIG scripts.
  • Wrangleddata, worked on large datasets (acquireddataand cleaned thedata), analyzed trends by making visualizations using matplot lib usingPython.
  • Utilized machine learning algorithms such as linear regression, Random Forests, K-means, & KNN fordataanalysis.
  • Performed statistical analysis on retail data using R and Python.
  • Created Interactive Dashboards and Stories inTableauincluding creating different visualizations with Bars, Lines and Pies, Maps, Scatter plots, Gantts, Bubbles, Histograms, Bullets, Heat maps and Highlight tables.
  • Connected Hive with Tableau and created charts for the users to understand the trends.
  • Automated the Daily KPI reports into a single dashboard using Tableau, which reduced the manual intervention by 95%.
  • Participated in testing process including Unit Testing, Performance Testing and Manual Testing of the system.

Confidential

Hadoop Developer

Responsibilities:

  • Actively involved in gathering requirements from end users, involved in modifying various technical & functional specifications.
  • Created Hive Tables, loaded retail transactional data using Sqoop.
  • Building data stores for analytical and reporting teams from data lake.
  • Managingdatabetween different databases like ingestingdatainto Cassandra and consuming the ingesteddatato Hadoop.
  • Creating Hive external tables to perform Extract, Transform and Load (ETL)/(ELT) operations ondatathat is generated on daily basis.
  • Creating Hive Tables, loading withdataand writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Integrating MapReduce with HBase to import huge clusters ofdatausing MapReduce programs.
  • Used Zookeeper to co-ordinate and run different cluster services.
  • Involved in converting Hive queries into spark transformations using Spark RDDs in Python.
  • Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes ofdataas per the requirement.
  • Performed transformations and actions on RDD’s using Spark.
  • Performed and conducted complex customanalyticsas needed by clients.
  • Performed partitioning, dynamic partitioning and bucketing on Hive tables for performance tuning.
  • Built data stores for BI and Analytics team from data lake.
  • Created Bar Charts which is complied with data sets and added trend lines and forecasting.
  • Created Hive queries that helped market analysts spot emerging trends by comparing freshdata with EDW reference tables and historical metrics.
  • Using SQL to query Databases. Performing various validations and mapping activities.
  • Performed high level data analysis using Hive and SQL.
  • Performed clustering and classification using statistical concepts.
  • Created multiple dahsboards on sales data and maintained company standards.
  • Retrieveddatafromdatawarehouse and generated a series of meaningful business reports using SSRS.
  • Connected Tableau to Hive to generate on the fly reports to distinguish all the metrics for higher level officials.

Confidential

Core JAVA/SQL Developer

Responsibilities:

  • Worked with Business Analyst in defining and refining requirements, estimation and analysis of enhancement and changes.
  • Collaborate with developers, support teams and testers to ensure integrated code functionality.
  • Involved in daily agile stand-ups with 2 week sprints and daily scrums.
  • Set up and administer SQL Server Database Security environments using Profiles, Database Privileges and Roles.
  • Used JDBC and Spring MVC to retrieve/update customer information to/from the database.
  • Implemented Spring MVC, dependency Injection.
  • Extensive use of Object Oriented Programming (OOP) concepts, Collections, Generics, Multi-Threading, Exception Handling, and Design Patterns for functionality, such as portfolio summary and user information.
  • Developed the SQL Server Integration Services (SSIS) packages to transform data from SQL 2005 to MS SQL 2008 as well as Created interface stored procedures used in SSIS to load/transform data to the database.
  • Identified and worked with Parameters (e.g. cascading parameters) for parameterized reports in SSRS.
  • Created and managed schema objects such as Tables, Views, Indexes and referential integrity depending on user requirements.
  • Developed various T-SQL stored procedures, triggers based on the requirement.
  • Defined Check constraints, Business Rules, Indexes and Views.
  • Involved in tuning of SQL queries and stored procedures using SQL Profiler and Index Tuning Wizard.
  • Imposed Referential Integrity of data by creating different constraints.
  • Created derived columns from the present columns for the given requirements.
  • Performed development support, document reviews, test plan, integration of the system.
  • Creation/ Maintenance of Indexes for various fast and efficient reporting processes.
  • Managing historical data from various heterogeneous data sources (i.e. Excel, Access).

We'd love your feedback!