Sr. Bigdata/hadoop Engineer Resume
Fort Washington, PA
SUMMARY
- Around 6 years of strong experience in software development using Big Data, Hadoop, Apache Spark Java/J2EE, Scala, Python technologies.
- Solid Mathematics, Probability and Statistics foundation and broad practical statistical and data mining techniques cultivated through various industry work and academic programs
- Involved in the Software Development Life Cycle (SDLC) phases which include Analysis, Design, Implementation, Testing and Maintenance.
- Strong technical, administration, and mentoring knowledge in Linux and Big Data/Hadoop technologies.
- Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie, Cassandra, Flume and Avro.
- Work experience with cloud infrastructure like Amazon Web Services (AWS).
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa
- Installing, configuring and managing of Hadoop Clusters and Data Science tools.
- Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, Hue.
- Setting up the High-Availability for Hadoop Clusters components and Edge nodes.
- Experience in developing Shell scripts and Python Scripts for system management.
- Experience in profiling huge sets of data using Informatica BDM 10
- Well versed in using Software development methodologies like Rapid Application Development (RAD), Agile Methodology and Scrum software development processes.
- Experience with Object Oriented Analysis and Design (OOAD)methodologies.
- Experience in installations of software, writing test cases, debugging, and testing of batch and online systems.
- Experience in Production, quality assurance (QA), SIT (System Integration testing) and user acceptance (UA) testing.
- Expertise in J2EEtechnologies like JSP, Servlets, EJBs 2.0, JDBC, JNDI and AJAX.
- Extensively worked on implementing SOA (Service Oriented Architecture) using XMLWeb services (SOAP, WSDL, UDDI and XML Parsers).
- Worked with XML parsers like JAXP (SAX and DOM) and JAXB.
- Expertise in applying Java Messaging Service (JMS)for reliable information exchange across Java applications.
- Proficient with Core Java,AWT and also with the markup languages likeHTML 5.0,XHTML, DHTML, CSS, XML 1.1, XSL, XSLT, XPath, XQuery, Angular.js, Node.js
- Worked with version control systems like Subversion, Perforce, and GIT for providing common platform for all the developers.
- Highly motivated team player with the ability to work independently and adapt quickly to new and emerging technologies.
- Creatively communicate and present models to business customers and executives, utilizing a variety of formats and visualization methodologies.
TECHNICAL SKILLS
Big Data Frameworks: Hadoop, Spark, Scala, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera, Mongo DB.
Big data distribution: Cloudera, Amazon EMR
Programming languages: Core Java, Scala, Python, SQL, Shell Scripting
Operating Systems: Windows, Linux (Ubuntu)
Databases: Oracle, SQL Server
Designing Tools: Eclipse
Java Technologies: JSP, Servlets, Junit, Spring, Hibernate
Web Technologies: XML, HTML, JavaScript, JVM, JQuery, JSON
Linux Experience: System Administration Tools, Puppet, Apache
Web Services: Web Service (RESTfuland SOAP)
Frame Works: Jakarta Struts 1.x, Spring 2.x
Development methodologies: Agile, Waterfall
Logging Tools: Log4j
Application / Web Servers: Cherrypy,Apache Tomcat, WebSphere
Messaging Services: ActiveMQ, Kafka, JMS
Version Tools: Git, SVN and CVS
Analytics: Tableau, SPSS, SAS EM and SAS JMP
PROFESSIONAL EXPERIENCE
Confidential, Fort Washington, PA
Sr. Bigdata/Hadoop Engineer
Responsibilities:
- Worked on importing data from various sources and performed transformations using MapReduce, hive to load data into HDFS.
- Worked on compression mechanisms to optimize MapReduce Jobs.
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Created scripts to automate the process of Data Ingestion.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
- Worked on setting up Pig, Hive and HBase on multiple nodes and developed using Pig, Hive, HBase and MapReduce
- Worked on the conversion of existing MapReduce batch applications for better performance.
- Created HBase tables to store variable data formats coming from different portfolios
- Performed real time analytics on HBase using Java API and Rest API
- Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables
- Worked on compression mechanisms to optimize MapReduce Jobs
- Analyzed the customer behavior by performing click stream analysis and to ingest the data used flume
- Experienced with working on Avro Data files using Avro Serialization system
- Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Environment: Hive, HDFS, MapReduce, Flume, Pig,Spark Core, Spark -SQL, Oozie, Oracle, Yarn, Netezza,GitHub, Junit, Linux, HBase, Cloudera, sqoop, HDFS, Java, Scala, Maven and Splunk, Eclipse.
Confidential, Woodbridge NJ
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Good understanding and related experience with Hadoop stack - internals, Hive, Pig and Map/Reduce
- The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Involved in defining job flows
- Involved in managing and reviewing Hadoop log files
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Supported Map Reduce Programs those are running on the cluster
- Involved in loading data from UNIX file system to HDFS.
- Responsible to manage data coming from different sources.
- Installed and configured Hive and developed Hive UDFs to extend core functionality of hive
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
Environment: Apache Hadoop, HDFS, Map Reduce, Pig, Hive tables, Hive UDFs, Linux, MySQL, HBase, UNIX, Java, ETL, Eclipse.
Confidential
Hadoop Developer
Responsibilities:
- Develop JAVA MapReduce Jobs for the aggregation and interest matrix calculation for users.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way
- Experienced in managing and reviewing applicationlog files.
- Ingest the application logs into HDFS and processes the logs using map reduce jobs.
- Create and maintain Hive warehouse for Hive analysis.
- Generate test cases for the new MR jobs.
- Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS)
- Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
- Developed dynamic partitioned Hive tables to store data by date and workflow id partition.
- Use Apache Scoop to dump the user incremental data into the HDFS on a daily basis.
- Run clustering and user recommendation agents on the weblogs and profiles of the users to generate the interest matrix.
- Worked on installing and configuring EC2 instances on Amazon Web Services (AWS) for establishing clusters on cloud.
- Installed and configured Hive and also written Hive UDFs in java and python
- Prepare the data for consumption by formatting it for upload to the UDB system.
- Lead & Programmed the recommendation logic for various clustering and classification algorithms using JAVA.
- Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Scala, Cassandra, Pig, Sqoop, Oozie, ZooKeeper, Teradata, PL/SQL, MySQL, Windows, Horton works, Oozie, HBase
Confidential
Java Developer
Responsibilities:
- Communicate with Clients for Requirements Gathering, Explaining the requirements to Team Members
- Analyzing the Requirements and Designing Screen Proto types.
- Involved in Project Documentation.
- Involved in creation of Basic DB Architecture for the application.
- Involved in adding solution to VSS.
- Designing & Development of Screens.
- Coded JS functions for client validations.
- Created user Controls for reusability.
- Creation of Tables, Views, Packages, Sequences, Functions for all the modules of the project.
- Developed Crystal Reports.
- Integrating the functionality of all modules.
- Involved in deploying the application.
- Unit testing & integration testing.
- Designing test plan, test cases and checking the validation.
- Test whether the application meets the business requirements.
- Implementation ofthe system at client Location.
- Giving Training to Application users, interacting with the client, understanding the change requests if any from client.
- Responsible for Immediate Error Resolving.
Environment: Core Java, JavaScript, J2EE, Servlets, JSP, Design Patterns, JDBC, HTML, CSS, AJAX, Hibernate, WebLogic, Oracle 8i, ANT, LINUX, SVN, Windows XP