Sr. Hadoop Developer Resume
Philadelphia, PA
SUMMARY
- IT Professional with 8+ years of referable experience in distributed file systems like HDFS and HBase in BigData environment.
- Excellent understanding of the complexities associated with BigData with expertise in developing modules and codes in MapReduce, Hive, Pig, Sqoop, Apache Flume and Apache Spark to address those complexities
- Highly skilled in Analysis, Design, Development and BigData in Scala, Spark, Hadoop, Pig and HDFS environment and experience in JAVA, J2EE.
- Experience in using HCatalog for Hive, Pig and HBase.Experienced with NOSQL databases like HBASE and Cassandra.
- Good experience in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
- Strong work experience on Kafka Streaming to fetch the data real time or near real time.
- Expert in data processing like collecting, aggregating, moving from various sources using Kafka.
- Good experience in developing solutions to analyze largedatasets efficiently.
- Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
- Familiar with various Relational databases - MS SQL and Teradata
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Impala, Hadoop MapReduce and Pig jobs.
- Hands on experience in Import/Export ofdatausing HadoopDataManagement tool SQOOP.
- Goo experience on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
- Comprehensive knowledge in Debugging, Optimizing and Performance Tuning of DB2, Oracle and MYSQL databases.
TECHNICAL SKILLS
Languages & Hadoop Components: HDFS, Sqoop, Flume, Hive, Pig, MapReduce, YARN, Oozie, Kafka, Spark, Impala, Storm, Hue, Zookeeper, Java, SQL.
BigData Platforms: Hortonworks, Cloudera, Amazon
Databases & NOSQL Databases: Oracle, MYSQL, Microsoft SQL Server, HBase and Cassandra
Operating Systems: Linux, UNIX, Windows
Development Methodologies: Agile/Scrum, Waterfall
IDE's: Eclipse, Net Beans, GitHub, Jenkins, Maven, IntelliJ, Ambari
Programming Languages: C, C++, JSE, XML, JSP/Servlets, Spring, HTML, JavaScript, jQuery, Web services, Python, Scala, PL/SQL & Shell Scripting
PROFESSIONAL EXPERIENCE
Confidential - Philadelphia, PA
Sr. Hadoop Developer
Responsibilities:
- Applied transformations on the data loaded into Spark Dataframes and done in memory data computation to generate the output response.
- Developed multiple POCs using Spark Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python (PySpark)
- Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Used hive to analyze the partitioned data and compute various metrics for reporting.
- Import the data from different sources like HDFS into Spark Data frames.
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs
- Extensively worked on Spark Context, Spark -SQL, Data Frame and Pair RDD's.
- Reduced the latency of spark jobs by tweaking the spark configurations and following other performance and Optimization techniques.
- Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java map-reduce, Hive, Pig.
- Used Hive, spark SQL Connection to generate Tableau BI reports.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Developed various data connections from data source to SSIS, Tableau Server for report and dashboard development
- Developed solutions utilizing the Hadoop ecosystem such Hadoop, Spark, Hive, HBASE, Pig, Sqoop, Oozie, Ambari, Zookeeper etc.
- Wrote MapReduce programs with Java API to cleanse structured and unstructured data.
- Worked on loading the data from MySQL & Teradata to HBase where necessary using Sqoop.
Environment: Scala, spark, Kafka, Hive, HortonWorks, Oozie, Play framework, Akka, Git, ElasticSearch, Logstash, Kibana, Kerberos
Confidential
Sr. Hadoop Developer/Admin
Responsibilities:
- Installed, configured, upgraded, and applied patches and bug fixes for Prod, Lab and Dev Servers.
- Install, configure and administer HDFS, Hive, Ranger, Pig, HBase, Oozie, Sqoop, Spark and Yarn.
- Involved in upgrading Cloudera Manager Upgrade from Cloudera Manager 5.5 to Cloudera Manager 5.6.
- Involved in capacity planning, load balancing and design of Hadoop clusters.
- Involved in setting up alerts in Cloudera Manager for the monitoring health and performance of Hadoop Clusters.
- Involved in installing and configuring security authentication using Kerberos security.
- Creating and dropping of users, granting and revoking permissions to users/Policies as and when required using Ranger.
- Commission and decommission the data nodes from cluster.
- Write and modify UNIX shell scripts to manage HDP environments.
- Involved in installed and configured Apache Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
- Create directories and setup appropriate permissions for different applications or users.
- Backup tables in HBase to HDFS using export utility.
- Involved in creating users, user’s groups and allotting the roles of the users and creating the home directory for the user.
- Installation, Configuration and administration of HDP on Red Hat Enterprise Linux 6.6
- Used Sqoop to import data into HDFS from Oracle database.
- Detailed analysis of system and application architecture components per functional requirements.
- Review and monitor system and instance resources to insure continuous operations (i.e., database storage, memory, CPU, network usage, and I/O contention)
- On call support for 24x7 Production job failures and resolve the issue in timely manner.
- Developed UNIX scripts for scheduling the delta loads and master loads using Auto sys Scheduler.
- Troubleshoots with problems regarding the databases, applications and development tools.
Environment: Hadoop, HDFS, Hive, Cloudera Manager, Sqoop, Flume, Oozie, CDH5, MongoDB, Cassandra, HBase, Hue, Kerberos and Unix/Linux
Confidential - Dallas, TX
Java/Hadoop Developer
Responsibilities:
- All the fact and dimension tables were imported from SQL Server into Hadoop using Sqoop.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in extracting customer’s BigData from various data sources into Hadoop HDFS (this included data from mainframes, databases and logs data from servers).
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Developed Tableau visualizations and dashboards using Tableau Desktop.
- Developed Tableau workbooks from multiple data sources using Data Blending.
- Involved in managing and reviewing Hadoop log files.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Developed python UDFs in Pig and Hive.
- Used Apache Kafka to gather log data and fed into HDFS.
- Data Ingestion using Sqoop from various sources like Informatica, Oracle
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
- Installed and configured various components of Hadoop ecosystem and maintained their integrity.
- Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Implemented automatic failover Zookeeper and zookeeper failover controller.
- Developed Java map-reduce programs to encapsulate transformations.
- Participated in Performance tuning in database side, transformations, and jobs level.
Environment: Hadoop, HDFS, Map Reduce, Sqoop, Hive, Pig, Oozie, HBase, CDH4, Cloudera Manager, MySQL, Eclipse
Confidential - Roseville, CA
Hadoop Developer
Responsibilities:
- Responsible for building data solutions in Hadoop using Cascading frameworks.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Worked hands on with ETL process.
- Explored with the spark, improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's and Spark Yarn.
- Imported the data from different sources like HDFS/HBase into Spark RDD.
- Developed Spark Code using Scala and Spark-SQL /streaming for faster testing and processing of data.
- Developed Kafka Producer and consumers, HBase clients, spark and Hadoop map reduce jobs along with components on HDFS and Hive.
- Upgraded the Hadoop Cluster from CDH3 to CDH4. Integrate the HIVE with existing applications.
- Configured Ethernet bonding for all Nodes to double the network bandwidth.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Teradata into HDFS using Sqoop.
- Used Python and Shell scripts to automate the end-to-end ELT process
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Performed data quality checks on data as per the business requirement.
- Performed data validation on target table in compared to the source table.
- Achieved high throughput and low latency for ingestion jobs leveraging the Sqoop
- Transformed the raw data and loaded into stage and target tables.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Teradata, Cloudera Manager, Pig, Sqoop, Oozie, Python
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in designing and developing modules at both Client and Server Side.
- Developed the UI using JSP, JavaScript and HTML.
- Responsible for validating the data at the client side using JavaScript.
- Interacted with external services to get the user information using SOAP web service calls
- Developed web components using JSP, Servlets and JDBC.
- Technical analysis, design, development and documentation with a focus on implementation and agile development.
- Developed a Web based reporting system with JSP, DAO and Apache Struts-Validator using Struts framework.
- Designed the controller using Servlets.
- Accessed backend database Oracle using JDBC.
- Developed and wrote UNIX Shell scripts to automate various tasks.
- Developed user and technical documentation.
- Developed business objects, request handlers and JSPs for this project using Java Servlets and XML.
- Developed core spring components with some of the modules and integrated it with the existing struts framework.
- Actively participated in testing and designed user interface using HTML and JSPs.
- Implemented the database connectivity to Oracle using JDBC, designed and created tables using SQL.
- Implemented the server side processing using Java Servlets.
- Installed and configured the Apache Web server and also deployed JSPs and Servlets in Tomcat Server.
Environment: Java, Servlets, JSP, JavaScript, JDBC, Unix Shell scripting, HTML, Eclipse, Oracle 8i, WebLogic.