SUMMARY:
- Senior Hadoop Developer with 7+ years of IT experience in software Development and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
- Successful history of effectively implementing systems and directing key initiatives. Deep - rooted interest in designing and crafting efficient modern software.
- Skilled in troubleshooting with the proven ability to provide creative and effective solutions through the application of highly developed problem-solving skills.
- Hadoop Developer: Experience in installing, configuring, maintaining and monitoring of Hadoop Clusters Apache, Cloudera and Sandbox.
- Hadoop Distributions: Horton works, Cloudera CDH4, CDH5 and Apache Hadoop.
- Hadoop Ecosystem: Hands-on experience on Hadoop Ecosystem including HDFS, Sqoop, MapReduce, Yarn, Pig, Hive, Impala, Zookeeper, and Oozie.
- Worked on Enterprise level Cloudera Hadoop clusters which has 500TB, 2 Petabytes size.
- Data Ingestion: Using Flume, designed the flow and configured the individual components. Efficiently transferred bulk data from and to traditional databases with Sqoop.
- Data Storage: Experience in maintaining distributed Storage HDFS.
- Data Processing: Processed data using Map Reduce and Spark.
- Data Analysis: Expertise in analyzing data using Pig scripting, Hive Queries, Sparks (python) and Impala.
- Migrated the stored procedures into hadoop transformations.
- Management and Monitoring: Maintained and coordinated service Zookeeper apart from designing and monitoring Oozie workflows. Used Azkaban batch job scheduler for controlling workflow of jobs.
- Messaging System: Used Kafka as proof of concept to achieve faster message transfer across systems.
- Scripting: Expertise in Hive, PIG, Impala, Shell Scripting, Perl Scripting, and Python.
- Cloud Platforms: Configured Hadoop clusters in OpenStack and Amazon Web Services (AWS).
- Visualization Integration: Integrated tableau, Alteryx using Impala and Hive ODBC connector.
- Java/J2EE: Expertise in spring web MVC and Hibernate. Proficient in HQL (Hibernate Query language).
- Project Management: Experience in Agile, Jira and Scrum project management.
- Web Interface Design: Html, CSS, JavaScript, and bootstrap.
- A quick learner with a proclivity for new technology and tools.
TECHNICAL SKILLS:
Java/J2EE, Hue, C++, C. LINUX, Windows, UNIX, MapReduce, Hive, Impala, Pig, Sqoop, ZooKeeper, Oozie, Spark, Toad, Flume, HBase. Netezza, DB2, Teradata, MySQL, Oracle, Maven, Eclipse, IntelliJ, JDBC, JSON. Python, Scala, Perl, Java Script, Autosys, JUnit, Net Beans, Putty, WinScp, FileZilla, Splunk, Log4j.
CHRONOLOGICAL SUMMARY OF EXPERIENCE:
Senior Hadoop Developer
Confidential, Wilmington, DE
Responsibilities:
- Involved in Design and development of common frameworks, utilities across work streams.
- Coordinated with business users to understand business requirements as part of development activities.
- Implemented common jdbc utility for data sourcing in spark.
- Improved the performance of spark jobs by configuring job settings.
- Optimized and tuned spark applications by using storage level mechanism persist, cache.
- Used HBase tables for storing the Kafka offset values.
- Data Enrichment process handled using Spark for all dimension and fact tables.
- Final tables are exported to Essbase two dimensional database for business validations.
- Handled spark return codes by adding custom method, when jobs running in cluster mode.
- Used broadcast variables for input control files as part of enrichment process.
- Used Splunk for log analysis in uat and prod environment, to overcome operate support.
- Handled Spark precision loss issue, by using Scala Big Decimal datatype.
- Imported data from different RDBMS systems such as Oracle, Teradata.
- Implemented custom jar utility for Excel to CSV file conversion.
- Used compression techniques such as Snappy, Gzip for data loads and archival.
- Handled incremental stats and compute stats as daily and weekly jobs to overcome memory issues and long running queries in impala.
- Involved in writing Jil scripts for Scheduling jobs using an automation tool Autosys.
- Daily, Monthly, Quarterly and adhoc based data loads automated in Autosys and will run as per calendar dates scheduled.
- Involved in Production Support, BAU Activities and Release management.
- Expertise in writing custom UDFs in Hive.
Environment: Cloudera Hadoop, Spark, Scala, Hive, HBase, Kafka, Essbase, Shell, Sqoop, Xml Workflows, Splunk, Teradata, Oracle, Hue, Impala, SVN, Bitbucket.
Senior Hadoop Developer
Confidential, Charlotte, NC
Responsibilities:
- Coordinated with business users to gather business requirements and interacted with technical leads for the Application design level.
- Implemented all custom file upload process in pyspark.
- Implemented common jdbc utility for data sourcing in spark.
- Worked on optimizing and tuning spark applications by using persist, cache, broadcast.
- Improved the performance of spark jobs by configuring job settings.
- Involved in edgenode migration for enterprise level cluster and re-built the application as per the new standards in architecture level.
- Created workflows in Oozie along with managing/coordinating the jobs and combining multiple jobs sequentially into one unit of work.
- Imported and exported data from different RDBMS systems such as Oracle, Teradata, SqlServer, Netezza and Linux systems such as Sas Grid.
- Handled semi-structured data such as excel, csv and imported from sas grid to hdfs by using sftp process.
- Ingested data into hive tables, using sqoop and sftp process.
- Used compression techniques such as Snappy, Gzip for data loads and archival.
- Created data pipelines and implemented all kinds of data transformations using hadoop and spark.
- Data level transformations have been done in intermediate tables before forming final tables.
- Data Integrity checks have been handled using hive queries, Hadoop and Spark.
- All reporting tables are exposed in Tableau, by using Impala Server for better performance.
- Installed and implemented Kerberos security authentication for applications Keytabs.
- Involved in writing jil scripts for Scheduling jobs using an automation tool Autosys.
- Daily, Monthly, Quarterly and adhoc based data loads automated in Austosys and will run as per calendar dates scheduled.
- Involved in Production Support, BAU Actvities and Release management.
- Expertise in writing custom UDFs in Hive.
Environment: Cloudera Hadoop, Pyspark, Hive, Pig, Shell, Sqoop, Oozie Workflows, Teradata, Netezza, Sql Server, Oracle, Hue, Impala, Jenkins, Kerberos.
Hadoop Developer
Confidential, Denver CO
Responsibilities:
- Planned, installed and configured the distributed Hadoop Clusters.
- Ingested data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
- Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala and Sqoop.
- Ingested data into Hive tables from MySQL, Pig and Hive using Sqoop.
- Designed the flow and configured the individual components using Flume.
- Transferred bulk data from and to traditional databases with Sqoop.
- Migrated the SQL stored procedures into Hadoop transformations
- Wrote Batch operation across multiple rows for DDL (Data Definition Language) and DML (Data
- Manipulation Language) for improvised performance using the client API calls
- Grouped and filtered data using hive queries, HQL and Pig Latin Scripts.
- Queried both Managed and External tables created by Hive using Impala.
- Implemented partitioning and bucketing in Hive for more efficient querying of data.
- Created workflows in Oozie along with managing/coordinating the jobs and combining multiple jobs sequentially into one unit of work.
- Maintained distributed Storage HDFS and Columnar Storage HBase.
- Analyzed data using Pig scripting, Hive Queries, and Impala.
- Maintained and coordinated service Zookeeper apart from designing and monitoring Oozie workflows
- Designed and created both Managed/ External tables depending on the requirement for Hive.
- Wrote custom UDFs in Hive.
Environment: Cloudera distribution CDH4, Hadoop, Map Reduce, MySQL, Linux, Hive, Pig, Impala, Sqoop, Zookeeper.
Hadoop Developer
Confidential, MA
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and developed multiple mapreduce jobs in PIG and Hive for data cleaning and pre-processing.
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in defining job flows.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Load and Transform large sets of structured and semi structured data.
- Responsible to manage data coming from different sources.
- Involved in creating Hive Tables, loading data and writing Hive queries.
- Utilized Apache Hadoop environment by Cloudera.
- Created Data model for Hive tables.
- Involved in Unit testing and delivered Unit test plans and results documents.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
Environment: HDFS, HIVE, Map Reduce, Shell, PIG, SQOOP, Oozie.
Java Developer
Confidential
Responsibilities:
- Designing the initial Web-WAP pages for a better UI as per the requirement.
- Involved in developing functional flow of the mZone application.
- Integrated social media APIS to the application.
- Used Ajax and JavaScript to handle asynchronous request to server, CSS to handle look and feel of the application.
- Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
- Involved and gained good exposure in creating the Hibernate POJO’s and developed Hibernate mapping Files.
- Worked on tuning of back-end Oracle stored procedures using TOAD.
- Used Hibernate, object/relational-mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle Relational data model with an SQL-based schema.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Performed Unit Testing Using JUnit and Load testing using LoadRunner.
- Implemented Log4J to trace logs and to track information.
Environment: JSP, Struts, Jquery, Tomcat, CSS, JUnit, Log4j, SQL/PLSQL, Oracle 9i, Hibernate, Web services.
Java Developer
Confidential
Responsibilities:
- Involved in Requirements gathering, Requirements analysis, Design, Development, Integration and Deployment.
- Used JavaScript to perform checking and validations at Client's side.
- Extensively used Spring MVC framework to develop the web layer for the application. Configured DispatcherServlet in web.xml.
- Designed and developed DAO layer using spring and Hibernate apart from using Criteria API.
- Created/generated Hibernate classes and configured XML apart from managing CRUD operations (insert, update, and delete).
- Involved in writing HQL and SQL Queries for Oracle 10g database.
- Used log4j for logging messages.
- Developed the classes for Unit Testing by using JUnit.
- Developed Business components using Spring Framework and database connections using JDBC.
Environment: Spring Framework, Spring MVC, Hibernate, HQL, Eclipse, JavaScript, AJAX, XML, Log4j, Oracle 9i, Web Logic, TOAD.