Hadoop Analytics Developer Resume
0/5 (Submit Your Rating)
SUMMARY
- Over 5+ Years IT experience including 4 years of experience with Big Data, Hadoop and its eco system components.
- 2+ years of experience in BI/Data Analytics including Tableau, Python, SSRS, R, Statistical Algorithms like Linear & logistic regression, KNN, SVM, Random forest, K - means, etc...
- Hands-on experience of Big Data Development using technologies such as HDFS, MapReduce, Spark, Pig, Hive, HBase, Sqoop and Flume.
- Experience in processing structured, Semi-Structured and UnstructuredDatausing different tools and frameworks in Hadoop Ecosystem.
- Experienced in working on ingesting data from RDBMS systems to Hadoop using Sqoop.
- Expertise in developing Hadoop Architecture and various components such as HDFS, Name Node, Data Node, and MapReduce/Yarn.
- Experience in using commercial Hadoop Distributions, Hortonworks Data Platform (HDP) and Cloudera Distribution including Hadoop (CDH).
- Good experience in using Big data components on Azure Cloud.
- Scheduling, monitoring job workflows and identifying failures with Oozie and integrating jobs withZookeeper.
- Experienced in working on various methods and custom tools to validate the ingested data.
- Expertise in working on HIVE data store storage formats (AVRO, Parquet, Sequence files) and performance issues.
- Working experience on ingesting and querying data in and out of HBase.
- Good knowledge in implementing various data processing techniques using Pig and MapReduce for handling the data and formatting it as required.
- Experienced in working on YARN resource scheduler to control resource allocation for various groups in organization.
- Have good knowledge on NoSQL databases like HBase, Cassandra.
- Used different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.
- Proficient in using OOPS and Java concepts such as Multi-threading and collections necessary for writing Map Reduce jobs.
- Very good experience in working on UNIX commands and shell scripting.
- Experience with code development frameworks - GitHub, BitBucket.
- Good knowledge and experience in Python.
- Expertise in writing SQL Queries, SSIS, SSRS, Dynamic-queries, sub-queries and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
- Flexible and versatile to adapt to any new environment.
- Team and Independent Player with excellent Organizational and Interpersonal skills.
TECHNICAL SKILLS
Hadoop / Big Data: HDFS, MapReduce, YARN, Hive, Hbase, Pig, Sqoop, Flume, Oozie, Tez, Zookeeper, Kafka, Solr, Spark Streaming, Avro, RC, ORC, Parquet and Ambari
NoSQL Databases: HBase, Cassandra
Languages: Java, Scala, Python
Scripting/Query: Shell Scripting, SQL, R
IDEs: Eclipse, Spring Tool Suite, Intellij IDEA
Version Control: GIT, SVN
Databases: Oracle 10g/9i/8i, DB2, MySQL
BI/Visualization: Tableau, SSRS
Operating Systems: Windows, Linux, UNIX family, MAC OSX
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Analytics Developer
Responsibilities:
- Worked on deploying data from Relational database management system (RDBMS) to Hadoop Distributed File System (HDFS) with the Hive Architecture.
- Used Hortonworks Data Platform on Azure cloud.
- Worked on SQOOP to ingest data from various relational data sources.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
- Worked with NoSQL database Hbase to store cleansed data.
- Design, build and support pipelines ofdataingestion, transformation, conversion and validation.
- Worked on design on Hive data store to store the data from various data sources.
- Installed and configured Hive and also written Hive UDFs.
- Extensively worked on HIVE data stores for text, Avro and RC storage formats.
- Worked on performance tuning of HIVE queries with partitioning and bucketing process.
- Participated in design of Flume source/sink module usage pattern for data ingestion.
- Worked on Spark to retrieve data by using Scala.
- Involved in creating Hive External tables, loading with data and writing hive queries which will run internally in map reduce way.
- Worked on performing data standardization using PIG scripts.
- Wrangleddata, worked on large datasets (acquireddataand cleaned thedata), analyzed trends by making visualizations using matplot lib usingPython.
- Utilized machine learning algorithms such as linear regression, Random Forests, K-means, & KNN fordataanalysis.
- Performed statistical analysis on retail data using R and Python.
- Created Interactive Dashboards and Stories inTableauincluding creating different visualizations with Bars, Lines and Pies, Maps, Scatter plots, Gantts, Bubbles, Histograms, Bullets, Heat maps and Highlight tables.
- Connected Hive with Tableau and created charts for the users to understand the trends.
- Automated the Daily KPI reports into a single dashboard using Tableau, which reduced the manual intervention by 95%.
- Participated in testing process including Unit Testing, Performance Testing and Manual Testing of the system.
Confidential
Hadoop Developer
Responsibilities:
- Actively involved in gathering requirements from end users, involved in modifying various technical & functional specifications.
- Created Hive Tables, loaded retail transactional data using Sqoop.
- Building data stores for analytical and reporting teams from data lake.
- Managingdatabetween different databases like ingestingdatainto Cassandra and consuming the ingesteddatato Hadoop.
- Creating Hive external tables to perform Extract, Transform and Load (ETL)/(ELT) operations ondatathat is generated on daily basis.
- Creating Hive Tables, loading withdataand writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Integrating MapReduce with HBase to import huge clusters ofdatausing MapReduce programs.
- Used Zookeeper to co-ordinate and run different cluster services.
- Involved in converting Hive queries into spark transformations using Spark RDDs in Python.
- Developed custom MapReduce programs and User Defined Functions (UDFs) in Hive to transform the large volumes ofdataas per the requirement.
- Performed transformations and actions on RDD’s using Spark.
- Performed and conducted complex customanalyticsas needed by clients.
- Performed partitioning, dynamic partitioning and bucketing on Hive tables for performance tuning.
- Built data stores for BI and Analytics team from data lake.
- Created Bar Charts which is complied with data sets and added trend lines and forecasting.
- Created Hive queries that helped market analysts spot emerging trends by comparing freshdata with EDW reference tables and historical metrics.
- Using SQL to query Databases. Performing various validations and mapping activities.
- Performed high level data analysis using Hive and SQL.
- Performed clustering and classification using statistical concepts.
- Created multiple dahsboards on sales data and maintained company standards.
- Retrieveddatafromdatawarehouse and generated a series of meaningful business reports using SSRS.
- Connected Tableau to Hive to generate on the fly reports to distinguish all the metrics for higher level officials.
Confidential
Core JAVA/SQL Developer
Responsibilities:
- Worked with Business Analyst in defining and refining requirements, estimation and analysis of enhancement and changes.
- Collaborate with developers, support teams and testers to ensure integrated code functionality.
- Involved in daily agile stand-ups with 2 week sprints and daily scrums.
- Set up and administer SQL Server Database Security environments using Profiles, Database Privileges and Roles.
- Used JDBC and Spring MVC to retrieve/update customer information to/from the database.
- Implemented Spring MVC, dependency Injection.
- Extensive use of Object Oriented Programming (OOP) concepts, Collections, Generics, Multi-Threading, Exception Handling, and Design Patterns for functionality, such as portfolio summary and user information.
- Developed the SQL Server Integration Services (SSIS) packages to transform data from SQL 2005 to MS SQL 2008 as well as Created interface stored procedures used in SSIS to load/transform data to the database.
- Identified and worked with Parameters (e.g. cascading parameters) for parameterized reports in SSRS.
- Created and managed schema objects such as Tables, Views, Indexes and referential integrity depending on user requirements.
- Developed various T-SQL stored procedures, triggers based on the requirement.
- Defined Check constraints, Business Rules, Indexes and Views.
- Involved in tuning of SQL queries and stored procedures using SQL Profiler and Index Tuning Wizard.
- Imposed Referential Integrity of data by creating different constraints.
- Created derived columns from the present columns for the given requirements.
- Performed development support, document reviews, test plan, integration of the system.
- Creation/ Maintenance of Indexes for various fast and efficient reporting processes.
- Managing historical data from various heterogeneous data sources (i.e. Excel, Access).