Sr. Hadoop Developer Resume
Daytona Beach, FL
PROFESSIONAL SUMMARY
- Around 9 years of programming experience involved in all phases of Software Development Life Cycle (SDLC)
- Over 5+ Years of Big Data experience in building highly scalable data analytics applications.
- Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka
- Good handson experiencing working with various hadoop disrtibutions mainly Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
- Good understanding of Distributed Systems architecture and design principles behind Parallel Computing.
- Expertise in developing production ready Spark applications utilizing Spark - Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's, SciKitLearn, SparkML(MLlib) and Tensorflow.
- Strong experience troubleshooting failures in spark applications and fine-tuning spark applications and hive queries for better performance.
- Worked extensively on Hive for building complex data analytical applications.
- Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.
- Sound Knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple hadoop Input & output formats.
- Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between system.
- Good experience working with AWS Cloud services like S3, EMR, Redshift, Athena etc.,
- Deep understanding of performance tuning, partitioning for optimizing spark applications.
- Worked on building real time data workflows using Kafka, Spark streaming and HBase.
- Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB.
- Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
- Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
- Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
- Experience in using Hadoop ecosystem and processing data using Tableau.
- Good knowledge in the core concepts of programming such as algorithms, data structures, collections.
- Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
- Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
- Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
- Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
TECHNICAL SKILLS
Programming Language: Java/J2EE, JSP, Servlets, AJAX, EJB, Struts, Spring, JDBC, JavaScript, PHP and Python.
Databases: MYSQL, SQL, DB2 and Teradata
Web services: REST, AWS, SOAP, WSDL, Servers Apache Tomcat, WebSphere, JBoss
Operating Systems: Unix, Linux, Windows, Solaris
IDE tools: My Eclipse, Eclipse, NetBeans
QA Tools: Crashlytics or Fabrics
Web UI: HTML, JavaScript, XML, SOAP, WSDL
PROFESSIONAL EXPERIENCE
Confidential, Daytona beach, FL
Sr. Hadoop Developer
Responsibilities:
- Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
- Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Data pipeline consists Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames and Scala.
- Used different tools for data integration with different databases and Hadoop.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Built real time data pipelines by developing kafka producers and spark streaming applications for consuming.
- Ingested syslog messages parse them and streams the data to Kafka.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
- Helped Dev ops Engineers for deploying code and debug issues.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Scheduled and executed workflows in Oozie to run various jobs.
- Experience in using Hadoop ecosystem and processing data using Amazon AWS.
Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux.
Confidential, Union, New Jersey
Sr. Hadoop Developer
Responsibilities:
- Developed multi-threaded Java based input adaptors for ingesting click stream data from external sources like ftp server and S3 buckets on daily basis.
- Created various spark applications using Scala to perform various enrichment of these click stream data combined with enterprise data of the users.
- Implemented batch processing of jobs using Spark Scala API.
- Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems.
- Developed Sqoop scripts to import/export data from Oracle to HDFS and into Hive tables.
- Stored the data in columnar formats using Hive.
- Involved building and managing NoSQL Database models using HBase.
- Worked in Spark to read the data from Hive and write it to Hbase .
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
- Worked with multiple file formats like Avro, Sequence, Parquet and Orc.
- Converted existing MapReduce programs to Spark Applications for handling semi structured data like JSON files, Apache Log files, and other custom log data.
- Loaded the final processed data to HBase tables to allow downstream application team to build rich and data driven applications.
- Experience in using Hadoop ecosystem and processing data using Amazon AWS.
- Worked with a team to improve the performance and optimization of the existing algorithms in Hadoop using Spark, Spark -SQL, Data Frame.
- Worked with Apache Ranger for enabling data security across the Hadoop ecosystem.
- Implemented business logic in Hive and written UDF’s to process the data for analysis.
- Used Oozie to define a workflow to coordinate the execution of Spark, Hive and Sqoop jobs.
- Addressing the issues occurring due to the huge volume of data and transitions.
- Designed, documented operational problems by following standards and procedures using JIRA.
Environment: Java 6, MongoDB, Apache Web server, HTML, JDBC, NoSQL, meteor.js, Eclipse, UNIX, CSS3, XML, JQuery, Oracle.
Confidential, Reston, VA
Hadoop Developer
Responsibilities:
- Involved in requirement analysis, design, coding and implementation phases of the project.
- Used Sqoop to load structured data from relational databases into HDFS.
- Loaded transactional data from Teradata using Sqoop and created Hive Tables.
- Worked on automation of delta feeds from Teradata using Sqoop and from FTP Servers to Hive.
- Performed Transformations like De-normalizing, cleansing of data sets, Date Transformations, parsing some complex columns.
- Worked with different compression codecs like GZIP, SNAPPY and BZIP2 in MapReduce, Pig and Hive for better performance.
- Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems
- Have used Ansible for automation of frameworks.
- Handled Avro, JSON and Apache Log data in Hive using custom Hive SerDes.
- Worked on batch processing and scheduled workflows using Oozie.
- Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
- Worked in agile sprint methodology environment.
- Have used the Knox gateway for having Hadoop security between the users and operators.
- Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run Map-reduce.
- Used Hive-QL to create partitioned RC, ORC tables, used compression techniques to optimize data process and faster retrieval.
- Implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
Environment: Apache Hadoop, HDFS, Cloudera Manager, Java, MapReduce, Eclipse Indigo, Hive, HBASE, PIG, Sqoop, Oozie, SQL, Spring.
Confidential, Dallas, TX
Hadoop Developer
Responsibilities:
- Communicating with business customers effectively to gather the required information for the project.
- Worked Extensively on Cloudera Distribution.
- Involved in loading data into HDFS from Teradata using Sqoop
- Experienced in moving huge amounts of log file data from different servers
- Worked on implementing complex data transformations using MapReduce framework.
- Involved in generating structured data through MapReduce jobs and have stored them in Hive tables and have analyzed the results through Hive queries based on the requirements.
- Worked on performance improvement by implementing Dynamic Partitioning and Buckets in Hive and by designing managed and external tables.
- Worked on migrating data relational data base to Big data technologies like Cassandra.
- Worked on development of PIG Latin scripts and have used ETL tools and Informatica for some pre-aggregations
- Worked on MapReduce programs to cleanse and pre-process data from various different sources.
- Worked on Sequence files and Avro files in map Reduce programs.
- Created Hive Generic UDF’s for implementing business logic. And have worked on incremental imports to Hive Tables.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Worked with Apache NiFi to automate the data flow between the systems and managed flow of information between systems
- Worked with Talend for integrating data from different data systems to Hadoop.
- Used Kerbos authentication for proving authentication access to Distributed Systems.
- Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Worked in agile sprint methodology environment.
- Loaded processed data into HBase tables using HBase Java api calls.
Environment: Hadoop, HDFS, HBase, Spark, Scala, Hive, MapReduce, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux.
Confidential
Manager
Responsibilities:
- Maintain sites and work force
- Liaising with clients and reporting on progress to staff and the public.
- Supervising construction workers and hiring subcontractors.
- Buying materials for each phase of the project.
- Monitoring build costs and project progress.
- Checking and preparing site reports, designs and drawings.
- Maintain quality Control checks.
- Day to day problem solving and dealing with any issues that arise.
- Working on-site at clients’ businesses or in a site office
Confidential
Java Developer
Responsibilities:
- Implemented the presentation layer with HTML, CSS and JavaScript
- Developed web components using JSP, Servlets and JDBC
- Implemented secured cookies using Servlets.
- Wrote complex SQL queries and stored procedures.
- Implemented Persistent layer using Hibernate API
- Implemented Search queries using Hibernate Criteria interface.
- Used CSS for good User Interface.
- Provided support for loans reports for CB&T
- Designed and developed Loans reports for Evans bank using Jasper and iReport.
- Involved in fixing bugs and unit testing with test cases using Junit.
- Object Oriented Analysis and Design using UML include development of class diagrams, Sequence diagrams and state diagrams and implemented these diagrams in Microsoft Visio.
- Maintained Jasper server on client server and resolved issues
- Actively involved in system testing.
- Fine tuning SQL queries for maximum efficiency to improve the performance
- Designed Tables and indexes by following normalizations.
- Involved in Unit testing, Integration testing and User Acceptance testing
- Utilizes Java and SQL day to day to debug and fix issues with client processes.
Environment: Java, Servlets, HTML, Java Script, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins.