Hadoop Analytics Developer Resume

SUMMARY

Over 5+ Years IT experience including 4 years of experience with Big Data, Hadoop and its eco system components.
2+ years of experience in BI/Data Analytics including Tableau, Python, SSRS, R, Statistical Algorithms like Linear & logistic regression, KNN, SVM, Random forest, K - means, etc...
Hands-on experience of Big Data Development using technologies such as HDFS, MapReduce, Spark, Pig, Hive, HBase, Sqoop and Flume.
Experience in processing structured, Semi-Structured and UnstructuredDatausing different tools and frameworks in Hadoop Ecosystem.
Experienced in working on ingesting data from RDBMS systems to Hadoop using Sqoop.
Expertise in developing Hadoop Architecture and various components such as HDFS, Name Node, Data Node, and MapReduce/Yarn.
Experience in using commercial Hadoop Distributions, Hortonworks Data Platform (HDP) and Cloudera Distribution including Hadoop (CDH).
Good experience in using Big data components on Azure Cloud.
Scheduling, monitoring job workflows and identifying failures with Oozie and integrating jobs withZookeeper.
Experienced in working on various methods and custom tools to validate the ingested data.
Expertise in working on HIVE data store storage formats (AVRO, Parquet, Sequence files) and performance issues.
Working experience on ingesting and querying data in and out of HBase.
Good knowledge in implementing various data processing techniques using Pig and MapReduce for handling the data and formatting it as required.
Experienced in working on YARN resource scheduler to control resource allocation for various groups in organization.
Have good knowledge on NoSQL databases like HBase, Cassandra.
Used different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.
Proficient in using OOPS and Java concepts such as Multi-threading and collections necessary for writing Map Reduce jobs.
Very good experience in working on UNIX commands and shell scripting.
Experience with code development frameworks - GitHub, BitBucket.
Good knowledge and experience in Python.
Expertise in writing SQL Queries, SSIS, SSRS, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
Flexible and versatile to adapt to any new environment.
Team and Independent Player with excellent Organizational and Interpersonal skills.

TECHNICAL SKILLS

Hadoop / Big Data: HDFS, MapReduce, YARN, Hive, Hbase, Pig, Sqoop, Flume, Oozie, Tez, Zookeeper, Kafka, Solr, Spark Streaming, Avro, RC, ORC, Parquet and Ambari

NoSQL Databases: HBase, Cassandra

Languages: Java, Scala, Python

Scripting/Query: Shell Scripting, SQL, R

IDEs: Eclipse, Spring Tool Suite, Intellij IDEA

Version Control: GIT, SVN

Databases: Oracle 10g/9i/8i, DB2, MySQL

BI/Visualization: Tableau, SSRS

Operating Systems: Windows, Linux, UNIX family, MAC OSX

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Analytics Developer

Responsibilities:

Worked on deploying data from Relational database management system (RDBMS) to Hadoop Distributed File System (HDFS) with the Hive Architecture.
Used Hortonworks Data Platform on Azure cloud.
Worked on SQOOP to ingest data from various relational data sources.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
Worked with NoSQL database Hbase to store cleansed data.
Design, build and support pipelines ofdataingestion, transformation, conversion and validation.
Worked on design on Hive data store to store the data from various data sources.
Installed and configured Hive and also written Hive UDFs.
Extensively worked on HIVE data stores for text, Avro and RC storage formats.
Worked on performance tuning of HIVE queries with partitioning and bucketing process.
Participated in design of Flume source/sink module usage pattern for data ingestion.
Worked on Spark to retrieve data by using Scala.
Involved in creating Hive External tables, loading with data and writing hive queries which will run internally in map reduce way.
Worked on performing data standardization using PIG scripts.
Wrangleddata, worked on large datasets (acquireddataand cleaned thedata), analyzed trends by making visualizations using matplot lib usingPython.
Utilized machine learning algorithms such as linear regression, Random Forests, K-means, & KNN fordataanalysis.
Performed statistical analysis on retail data using R and Python.
Created Interactive Dashboards and Stories inTableauincluding creating different visualizations with Bars, Lines and Pies, Maps, Scatter plots, Gantts, Bubbles, Histograms, Bullets, Heat maps and Highlight tables.
Connected Hive with Tableau and created charts for the users to understand the trends.
Automated the Daily KPI reports into a single dashboard using Tableau, which reduced the manual intervention by 95%.
Participated in testing process including Unit Testing, Performance Testing and Manual Testing of the system.

Confidential

Hadoop Developer

Responsibilities:

Actively involved in gathering requirements from end users, involved in modifying various technical & functional specifications.
Created Hive Tables, loaded retail transactional data using Sqoop.
Building data stores for analytical and reporting teams from data lake.
Managingdatabetween different databases like ingestingdatainto Cassandra and consuming the ingesteddatato Hadoop.
Creating Hive external tables to perform Extract, Transform and Load (ETL)/(ELT) operations ondatathat is generated on daily basis.
Creating Hive Tables, loading withdataand writing Hive queries which will invoke and run MapReduce jobs in the backend.
Integrating MapReduce with HBase to import huge clusters ofdatausing MapReduce programs.
Used Zookeeper to co-ordinate and run different cluster services.
Involved in converting Hive queries into spark transformations using Spark RDDs in Python.
Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes ofdataas per the requirement.
Performed transformations and actions on RDD’s using Spark.
Performed and conducted complex customanalyticsas needed by clients.
Performed partitioning, dynamic partitioning and bucketing on Hive tables for performance tuning.
Built data stores for BI and Analytics team from data lake.
Created Bar Charts which is complied with data sets and added trend lines and forecasting.
Created Hive queries that helped market analysts spot emerging trends by comparing freshdata with EDW reference tables and historical metrics.
Using SQL to query Databases. Performing various validations and mapping activities.
Performed high level data analysis using Hive and SQL.
Performed clustering and classification using statistical concepts.
Created multiple dahsboards on sales data and maintained company standards.
Retrieveddatafromdatawarehouse and generated a series of meaningful business reports using SSRS.
Connected Tableau to Hive to generate on the fly reports to distinguish all the metrics for higher level officials.

Confidential

Core JAVA/SQL Developer

Responsibilities:

Worked with Business Analyst in defining and refining requirements, estimation and analysis of enhancement and changes.
Collaborate with developers, support teams and testers to ensure integrated code functionality.
Involved in daily agile stand-ups with 2 week sprints and daily scrums.
Set up and administer SQL Server Database Security environments using Profiles, Database Privileges and Roles.
Used JDBC and Spring MVC to retrieve/update customer information to/from the database.
Implemented Spring MVC, dependency Injection.
Extensive use of Object Oriented Programming (OOP) concepts, Collections, Generics, Multi-Threading, Exception Handling, and Design Patterns for functionality, such as portfolio summary and user information.
Developed the SQL Server Integration Services (SSIS) packages to transform data from SQL 2005 to MS SQL 2008 as well as Created interface stored procedures used in SSIS to load/transform data to the database.
Identified and worked with Parameters (e.g. cascading parameters) for parameterized reports in SSRS.
Created and managed schema objects such as Tables, Views, Indexes and referential integrity depending on user requirements.
Developed various T-SQL stored procedures, triggers based on the requirement.
Defined Check constraints, Business Rules, Indexes and Views.
Involved in tuning of SQL queries and stored procedures using SQL Profiler and Index Tuning Wizard.
Imposed Referential Integrity of data by creating different constraints.
Created derived columns from the present columns for the given requirements.
Performed development support, document reviews, test plan, integration of the system.
Creation/ Maintenance of Indexes for various fast and efficient reporting processes.
Managing historical data from various heterogeneous data sources (i.e. Excel, Access).

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship