Hadoop Developer Resume
Norwalk, CT
SUMMARY
- 8+ years of overall experience in building and developing Hadoop Map Reduce solutions and also experience in using Hive, Pig, Spark, Storm, Flume and Kafka.
- Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache Ambari, Hortonworks, Cloudera distributions and AWS.
- Implementing and control on data migration to and from AWS.
- Experience in designing and deploying appliation by selecting AWS services like EMR, EC2, S3based upon the requirements.
- Having experience on RDD arcshitecture and implementing spark operations on RDD and also optimizing transformations and actions in spark.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in installation and setup of various Kafka producers and consumers along with the Kafka brokers and topics.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Experienced in managing Hadoop cluster using Cloudera Manager Tool.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Java (core) MapReduce and Pig jobs.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in analyzing data using HQL, Pig Latin, and custom Map Reduce programs in core Java.
- Good experience in writing Python Scripts
- Good experience with both Job Tracker (Map reduce 1) and yarn (Map reduce 2).
- Good experience in Spark and its related technologies like SparkSQL, Spark Streaming.
- Strong experience in developing Map Reduce Programming and customizing framework at various levels and worked on various Input formats like SequenceFileInputFormat, KeyValue Pair Input Format.
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
- Worked on NoSQL databases including HBase, Cassandra and MongoDB.
- Experience in Developing Maven and Shell scripts to automatically compile, package, deploy and test J2EE applications
- Knowledge in job workflow scheduling and monitoring tools like oozie.
- Implemented data science algorithms like shift detection in critical data points using Spark, doubling the performance.
- Good understanding of NoSql Data bases and hands on work experience in writing applications on NoSQL databases like Cassandra and MongoDB
- Good understanding of memory layering concepts like Ignite and Gridgain.
- Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
- Experience in Apache Flume for efficiently collecting, aggregating, and moving large amounts of log data.
- Involved in developing web-services using REST, HBase Native API Client to query data from HBase.
- Experience as a Java Developer in Web/intranet, client/server technologies using Java, J2EE, Servlets, JSP, JSF, EJB, JDBC and SQL.
- Experienced Test driven development (TDD), Agile Scrum, Water Fall and RUP Methodology to produce high quality deliverables.
- Experience in building the CI system with Databricks, GitHub, Jenkins, and AWS
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Experience in using apache Solr for search applications.
- Familiarity working with popular frameworks likes Struts, Hibernate, Spring MVC and AJAX.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP
- Good Understanding in Apache Hue and Accumulo.
- Techno-functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
- Good in using version control like GITHUB and SVN.
TECHNICAL SKILLS
Big Data: Hadoop HDFS, Map Reduce, HIVE, PIG, Pentaho, HBase, ZooKeeper, Sqoop, Oozie, Cassandra, Spark, Scala, Storm, Flume, Kafka and Avro.
Web Technologies: Core Java, J2EE, Servlets, JSP, JDBC, JBOSS, XML, AJAX, SOAP, WSDL
Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)
Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2
Programming Languages: Java, PythonXML, Unix Shell scripting, HTML.
Data Bases: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, MS-Access
Web Services: Web Logic, Web Sphere, Apache Tomcat
Monitoring & Reporting tools: Ganglia, Nagios, Custom Shell scripts
PROFESSIONAL EXPERIENCE
Confidential, Norwalk, CT
Hadoop Developer
Responsibilities:
- Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
- While developing applications involved in complete Software Development Life Cycle(SDLC).
- Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
- Developed oozie workflow for scheduling ETL process and Hive Scripts
- Involved in teams to analyze the Anomaly detection and ratings of data.
- Implemented custom input format and record reader to read XML input efficiently using SAX parser.
- Involved in writing queries in Sparksql using Scala.
- Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement
- Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Having experience on RDD architecture and implementing Spark operations on RDD and also optimizing transformations and actions in Spark.
- Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster
- Involved in converting Hive/Sql queries into Spark transformations using Spark RDD’s.
- Loading data from Linux file system to HDFS and vice-versa
- Developed UDF’s using both DataFrames/Sql and RDD in Spark For data Aggregation queries and reverting back into OLTP through sqoop.
- POC for enabling member and suspect search using Solr.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
- Using CSVExcelStorage to parse with different delimiters in PIG.
- Installed and monitored Hadoop ecosystems tools on multiple operating systems like Ubuntu, CentOS.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Involved in setting QA environment by implementing pig and sqoop scripts.
- Responsible for designing and implementing ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
- Developed Pig Latin scripts to do operations of sorting, joining and filtering enterprise data.
- Implemented test scripts to support test driven development and integration.
- Developed multiple MapReduce jobs in java to clean datasets.
- Involved in loading data from Linux file systems, servers, java web services using Kafka producers and consumers.
- Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Developed UNIX shell scripts for creating the reports from Hive data.
- Manipulate, serialize, model data in multiple forms like JSON, XML.
- Involved in setting up map reduce 1 and map reduce 2.
- Prepared avro schema files for generating Hive tables
- Created Hive tables and loaded the data in to tables and query data using HQL.
- Installed and Configured Hadoop cluster using Amazon Web Services (AWS) for POC purposes.
Environment: Hadoop Map Reduce 2 (yarn), HDFS, PIG, Hive, Flume, Cassandra, Eclipse, Core Java, Sqoop, Spark, Maven, SparkSQl, Cloudera,SolrTalend, Linux shell scripting.
Confidential, SanFrancisco,CA
Hadoop Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
- Created Python scripts that provided constant updates about large data transfers via HipChat.
- Wrote scripts in Python for extracting data from HTML file
- Developed multipleMapReducejobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Good experience in analyzing Hadoop cluster and different analytic tools like Pig, Impala
- Experienced in managing andreviewingHadooplog files.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.
- Experienced in working with spark eco system using Sparksql and Scala queries on different formats loke Text file, CSV file.
- Expertized in implementing Spark using scala and Sparksql for faster testing and processing of data responsible to manage data from different sources.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs and UDAFs
- Experience in writing storm topology to accept events from kafka producer and emit to Cassandra.
- Involved in setting QA environment by implementing pig and sqoop scripts.
- Involved in creating Hive tables, loading with data and writing hive queries(HQL) which will run internally in map reduce way.
Environment: Java 6, Core java, Eclipse, Linux, Hadoop, Spark HBase, Sqoop, Pig, Impala, Hive, HQL,Flume, Spring MVC, Python, Maven, Oracle 11g, XML, Cloudera,Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008.
Confidential, Seattle,WA
Java/Hadoop Developer
Responsibilities:
- Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
- Participated in requirement gathering and converting the requirements into technical specifications.
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
- Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
- Wrote complex SQL queries and stored procedures.
- Developed the XML Schema and Amazon Web services for the data maintenance and structures.
- Experienced in managing Hadoop cluster using Manager Tool.
- Worked on analyzing, writing Hadoop Mapreduce jobs using JavaAPI,Pig and hive.
- Selecting the appropriate AWS service based upon data, compute, system requirements.
- Installed and Configured Hadoop cluster using Amazon Web Services (AWS) EMR.
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Apache Axis 2 Web Service.
- Experience in creating integration between Hive and HBase for effective usage and performed MR Unit testing for the Map Reduce jobs.
- Got good experience with NOSQL database like MongoDB.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database.
- Used Hibernate ORM framework with spring framework for data persistence and transaction management.
- Used struts validation framework for form level validation.
- Wrote test cases in JUnit for unit testing of classes.
- Involved in templates and screens in HTML and JavaScript.
- Involved in integrating Web Services using WSDL and UDDI.
- Suggested latest upgrades for Hadoop clusters.
- Created HBase tables to load large sets of data coming from UNIX and NoSQL
- Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes
Environment: TDD, JDK 1.5, J2EE 1.4, Struts 1.3, JSP, Servlets 2.5, WebSphere 6.1, HTML, XML, ANT 1.6, JavaScript, Node.js, JUnit 3.8, HDFS, MongoDB, Hive, Impala, HBase UNIX, AWS,Hortonworks
Confidential, Dallas,TX
ETL Developer
Responsibilities:
- Analyzed the requirements and framed the business logic for the ETL process.
- Extracted data from Oracle as one of the source databases.
- Followed Star Schema to design dimension and fact tables.
- Collect and link metadata from diverse sources, including relational databases Oracle, XML and flat files.
- Conversion of more than 15000 scripts from Teradata to Netezza.
- Using Data stage ETL tool to copy data from Teradata to Netezza
- Developed mappings in Informatica to load the data including facts and dimensions from various sources into the Data Warehouse, using different transformations like Source Qualifier, JAVA, Expression, Lookup, Aggregate, Update Strategy and Joiner.
- Loaded the flat files data using Informatica to the staging area.
- Created SHELL SCRIPTS for generic use.
- Using the data Integration tool Pentaho for designing ETL jobs in the process of building Data warehouses
- Developed unit/assembly test cases and UNIX shell scripts to run along with daily/weekly/monthly batches to reduce or eliminate manual testing effort
Environment: Windows XP/NT, Netezza Informatica Powercenter 9.1/8.6, UNIX, Pentaho Data integration, Oracle 11g, SQL, Oracle Designer, MS VISIO, Autosys, Korn Shell, Quality Center 10.
Confidential
Java Developer
Responsibilities:
- Played an active role in the team by interacting with welfare business analyst/program specialists and converted business requirements into system requirements.
- Developed and deployed UI layer logics of sites using JSP.
- Struts (MVC) is used for implementation of business model logic.
- Worked with Struts MVC objects like Action Servlet, Controllers, and validators, Web Application Context, Handler Mapping, Message Resource Bundles and JNDI for look-up for J2EE components.
- Developed dynamic JSP pages with Struts.
- Developed the XML data object to generate the PDF documents and other reports.
- Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
- Messaging and interaction of Web Services is done using SOAP and REST
- Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios
- Worked with Restful web services to enable interoperability.
Environment: core java,J2EE, JDBC, Java 1.4, Servlets, JSP, Struts, Hibernate, Web services, RESTfull services, SOAP, WSDL, Design Patterns, MVC, HTML, JavaScript 1.2, WebLogic 8.0, XML, Junit, Oracle 10g, My Eclipse.
Confidential
Java Developer
Responsibilities:
- Developed the application under JEE architecture, developed Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
- Deployed & maintained the JSP, Servlets components on Web logic 8.0
- Developed Application Servers persistence layer using, JDBC, SQL, Hibernate.
- Used JDBC to connect the web applications to Data Bases.
- Developed and utilized J2EE Services and JMS components for messaging communication in Web Logic.
- Configured development environment using Web logic application server for developer’s integration testing.
Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, EJB, AJAX, Java Script, Web Logic 8.0, HTML, JDBC 3.0, XML, JMS, log4j, Junit, Servlets, MVC, My Eclipse.