Sr. Hadoop Developer Resume
Phoenix, AZ
SUMMARY:
- 9+ years of professional experience this includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Hadoop technologies.
- Experienced in installing, configuring, testing Hadoop ecosystem components on Linux /UNIX including Hadoop Administration (like Hive, pig, Sqoop etc.)
- Expertise in Java, Hadoop Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, Zookeeper,Impala andNoSQL Database.
- Excellent experienced on Hadoop ecosystem, In - depth understanding of Map Reduce and the Hadoop Infrastructure.
- Excellent experience in Amazon, Cloudera and Hortonworks Hadoop distribution and maintaining and optimized AWS infrastructure (EMR EC2, S3, EBS)
- Expertise in developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming paradigm.
- Experienced working with Hadoop Big Data technologies(hdfs and Mapreduce programs), Hadoop echo systems (Hbase, Hive, pig) and NoSQL database MongoDB
- Experienced on usage of NoSQLdatabase column-oriented HBase.
- Extensive experienced in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
- Experienced on major components in Hadoop Ecosystem including Hive, Sqoop, Flume &knowledge ofMapReduce/HDFS Framework.
- Experienced in working with MapReduce Design patterns to solve complex MapReduce programs.
- Excellent Knowledge in Talend Big data integration for business demands to work towards Hadoop and NoSQL
- Hands-on programming experience in various technologies like JAVA, J2EE, HTML, XML
- Excellent Working Knowledge on Sqoop and Flume for Data Processing
- Expertise in loading the data from the different Data sources like (Teradata and DB2) into HDFS using sqoop and load into partitioned Hive tables
- Experienced on Hadoop cluster maintenance including data and metadata backups, file system checks, commissioning and decommissioning nodes and upgrades.
- Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java.
- Strong experience in analyzing large amounts of data sets writing Pigscripts and Hive queries.
- Extensive experienced in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Experienced in importing and exporting data using Sqoop from HDFS to Relational Database.
- Expertise in job workflow scheduling and monitoring tools like Oozie.
- Experienced in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
- Extensively designed and executed SQL queries in order to ensure data integrity and consistency at the backend.
- Strong experience in architecting batch style large scale distributed computing applications using tools like Flume, Map reduce, Hive etc.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapRetc) to fully implement and leverage new Hadoop features
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.,
- Experienced in working with different scripting technologies like Python, UNIX shell scripts.
- Strong experienced in working with UNIX/LINUX environments, writing shell scripts.
- Excellent knowledge and working experience in Agile&Waterfallmethodologies.
- Expertise in Web pages development using JSP, HTML, Java Script, JQuery and Ajax.
- Experienced in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, and MySQL & Sybase databases.
- Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Zookeeper and Cloudera Manager, MongoDB, NO SQL Database HBase, Cassandra
Monitoring and Reporting: Tableau, Custom shell scripts, Hadoop Distribution Horton Works, Cloudera, MapR
Build Tools: Maven, SQL Developer
Programming & Scripting: JAVA, J2EE, HTML, Java script, JQuery, PL/SQL, C, SQL, Shell Scripting, Python.
Databases: Oracle, MY SQL, MS SQL server, Teradata
PROFESSIONAL EXPERIENCE:
Sr. Hadoop Developer
Confidential, Phoenix, AZ
Responsibilities:
- Involved in Installing, Configuring Hadoop Eco System, Cloudera Manager using CDH4 Distribution.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Integrated scheduler with Oozie work flows to get data from multiple data sources parallel using fork.
- Created Data Pipeline of MapReduce programs using Chained Mappers. mplemented Optimized join base by joining different data sets to get top claims based on state using MapReduce.
- Implemented complex mapreduce programs to perform joins on the Map side using Distributed Cache in Java.
- Created the high level Design for the Data Ingestion and data extraction Module, enhancement ofHadoopMap-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
- All this happens in a distributed environment.
- Developed several advanced MapReduce programs to process data files received.
- Usage of Sqoop to import data into HDFS from MySQL database and vice-versa.
- Responsible for importing log files from various sources into HDFS using Flume.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java mapreduce Hive, Pig, and Sqoop.
- Developed Unit test cases using Junit and MRUnit testing frameworks.
- Experienced in Monitoring Cluster using Cloudera manager.
- Helped the Business intelligence team in designing dashboards and workbooks.
Environment: Hadoop, HDFS, HBase, MapReduce, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, MySQL.
Sr. Big Data/Hadoop Developer
Confidential, Parker, CO
Responsibilities:
- Responsible for installation and configuration of Hive, Pig, Hbase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Developed the Sqoop scripts in order to make the interaction between Hive and vertica Database.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
- Written Map Reduce code to process and parsing the data from various sources and storing
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Scheduled several time based Oozie workflow by developing Python scripts.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and MySQL.
- End-to-end architecture and implementation of client-server systems using Scala, Akka, Java, JavaScript and related, Linux
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Developed custom Apache Spark programs in Scala to analyze and transform unstructured data.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
- Scheduled map reduce jobs in production environment using Oozie scheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig
- Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Improved the Performance by tuning of HIVE and map reduce.
- Research, evaluate and utilize new technologies/tools/frameworks around Hadoop ecosystem
Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts, Teradata, Oracle, HBase, MongoDB, Cassandra, Cloudera, AWS, Javascript, JSP, Kafka, Spark, Scala and ETL, Python.
Hadoop/Big Data Developer
Confidential, Grand Junction, CO
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability
- Custom talend jobs to ingest, entich and distribute data in Cloudera Hadoop ecosystem.
- Downloads the data that was generated by sensors from the Patients body activities, the data will be collected in to the HDFS system online aggregators by Kafka.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
- Implemented Spark Core in Scala to process data in memory.
- Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Have done OOP's and functional programming on SCALA.
- Used Hadoop's Pig, Hive and Map Reduce for analyzing the Health insurance data to help by extracting data sets for meaningful information such as medicines, diseases, symptoms, opinions, geographic region detail etc.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Handled importing of data from various data sources, performed transformations using MapReduce, Spark and loaded data into HDFS.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing rmation or merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
- UsedPig in three distinct workloads like pipelines, iterative processing and research.
- UsedPig UDF's in Python, Java code and uses sampling of large data sets.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files by using some piggybank.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Created PIG Latin scripting and Sqoop Scripting.
- Involved in transforming data from legacy tables to HDFS, and HBASE tables using Sqoop
- Implemented exception tracking logic using Pig scripts
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Analyzed large amounts of data sets to determine optimal way to aggregate and report onit
Environment: Hadoop, Map Reduce, Spark, shark, Kafka, HDFS, Hive, Pig, Oozie, Core Java, Eclipse, Hbase, Flume, Cloudera, Oracle 10g, UNIX Shell Scripting, Scala, MongoDB, HBase, Cassandra, Python.
Sr. Java/J2EE Developer
Confidential, Westminster, CO
Responsibilities:
- Used JSF framework to implement MVC design pattern.
- Developed and coordinated complex high quality solutions to clients using J2SE, J2EE, Servlets, JSP, HTML, Struts, Spring MVC, SOAP, JavaScript, JQuery, JSON and XML.
- Wrote JSF managed beans, converters, and validator's following framework standards and used explicit and implicit navigations for page navigations.
- Designed and developed Persistence layer components using HibernateORM tool.
- UI designed using JSF tags, Apache Tomahawk & Rich faces.
- Oracle 10g used as backend to store and fetch data.
- Experienced in using IDE's like Eclipse and Net Beans, integration with Maven
- Creating Real-time Reporting systems and dashboards using xml, MySQL, and Perl
- Working on Restful web services which enforced a stateless client server and support JSON (few changes from SOAP to RESTFUL Technology) Involved in detailed analysis based on the requirement documents.
- Involved in Design, development and testing of web application and integration projects using Object Oriented technologies such as Core Java, J2EE, Struts, JSP, JDBC, Spring Framework, Hibernate, Java Beans, Web Services (REST/SOAP), XML, XSLT, XSL, and Ant.
- Designing and implementing SOA compliant management and metrics infrastructure for Mule ESB infrastructure utilizing the SOA management components.
- Used Node JS for server side rendering. Implemented modules into Node JS to integrate with designs and requirements.
- JAX-WS used to interact in front-end module with backend module as they are running in two different servers.
- Responsible for Offshore deliverables and provide design/technical help to the team and review to meet the quality and time lines.
- Migrated existing Struts application to Spring MVC framework.
- Provided and implemented numerous solution ideas to improve the performance and stabilize the application.
- Extensively used LDAP Microsoft Active Directory for user authentication while login.
- Developed unit test cases using JUnit.
- Created the project from scratch using Angular JS as frontend, Node Express JS as backend.
- Involved in developing perl script and some other scripts like java script
- Tomcat is the webserver used to deploy OMS web application.
- Used SOAPLite module to communicate with different web-services based on given WSDL.
- Prepared technical reports &documentation manuals during the program development.
Environment: JDK 1.5, JSF, Hibernate 3.0, JIRA, NodeJS, Cruise control, Log4j, Tomcat, LDAP, JUNIT, NetBeans, Windows/UNIX.
Java/J2EE Developer
Confidential, Orlando, FL
Responsibilities:
- Participation in requirement gathering meeting with client side business teams to understand the requirements.
- Preparation of User Requirement Documentation.
- Worked extensively on Singleton, Session Facade, Service Locator, Factory Business Delegate, Data Access Objects and otherJ2EEcore patterns.
- Involved in document work flow configuration used by IBM Enterprise content management.
- Designed the Front-end screens using JSP Tag Libraries, XHTML, CSS, JQuery and JavaScript.
- Configured hibernate O/R mapping methodologies and written SQL queries.
- Involved in developing Database access components using Spring DAO integrated with Hibernate.
- Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
- Wrote Custom Tags for custom requirements of fields on GUI.
- Using Hibernate with XML mappings as ORM.
- Extensively used Spring AOP and Dependency.
- Developed the project using agile development methodologies Coordination with team, peer reviews and collaborative System level testing.
- For Dynamic form generation, auto completion of forms, and user-validation functionalities used AJAX in Broadband module.
- Involved in Deploying and Configuring applications in Web Sphere.
- Used PL/SQL, Stored Procedures for handling Database in Oracle.
- Involved in loading and storing objects using Hibernate.
- Involved in configuring Hibernate mapping file.
- Used version one to work on agile development.
Environment: Java, Struts, JSP, JDBC, XML, Junit, Rational Rose, CVS, DB2, Windows.