Hadoop Developer Resume
San Jose, CA
PROFESSIONAL SUMMARY:
- 8+ years of IT experience in complete life cycle of software development using Object Oriented analysis and design using Big data Technologies / Hadoop ecosystem, SQL, Java, J2EE technologies.
- Around 5years of experience working on Big Data and Data Science building Advanced Customer Insight and Product Analytic Platforms using Big Data and Open Source Technologies.
- Wide experience on Data Mining, Real time Analytics, Business Intelligence, Machine Learning and Web Development.
- Leveraged strong Skills in developing applications involving Big Data technologies likeHadoop, Spark, ElasticSearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro and Scala.
- Skilled programming in Map - Reduce framework and Hadoop ecosystems.
- Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Experience in implementingInverted Indexing algorithm using MapReduce.
- Extensive experiencein creating Hive tables, loading them with data and writing hive queries which will run internally in MapReduce way.
- Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations.
- Experience in setting up standards and processes for Hadoop based application design and implementation.
- Good Exposure on Apache Hadoop MapReduce programming, PIG Scripting and HDFS4.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.
- Experience in writing Pig UDF’s (Eval, Filter, Load and Store) and macros.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Very good understanding on NOSQL databases like MongoDB, Cassandra and HBase.
- Experience in coordinating Cluster services through ZooKeeper.
- Hands on experience in setting up Apache Hadoop, MapR and Hortonworks Clusters.
- Good knowledge on Apache Hadoop Cluster planning which includes choosing the Hardware and operating systems to host an Apache Hadoop cluster.
- Experience in Hadoop Distributions like Cloudera, HortonWorks, BigInsights, MapR Windows Azure, and Impala.
- Experience using integrated development environment like Eclipse, Net beans, JDeveloper, MyEclipse.
- Excellent understanding of relational databases as pertains to application development using several RDBMS including in IBM DB2, Oracle 10g, MS SQL Server 2005/2008, and MySQL and strong database skills including SQL, Stored Procedure and PL/SQL.
- Working knowledge on J2EE development with Spring, Struts, Hibernate Frameworks in various projects and expertise in Web Services (JAXB, SOAP, WSDL, Restful) development
- Experience in writing tests using Spec2, Scala Test, Selenium, TestNg and Junit.
- Ability to work on diverse Application Servers like JBOSS, APACHE TOMCAT, WEBSPHERE.
- Worked on different OS like UNIX/Linux, Windows XP, and Windows
- A passion to learn new things (new Languages or new Implementations) have made me up to date with the latest trends and industry standard.
- Proficient in adapting to the new Work Environment and Technologies.
- Quick learner and self-motivated team player with excellent interpersonal skills.
- Well focused and can meet the expected deadlines on target.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Avro, Hadoop Streaming, Cassandra, Oozie, Zookeeper, Spark, Strom, Kafka
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE s: Eclipse, Net beans, WSAD, Oracle SQL Developer
Big data Analytics: Datameer 2.0.5
Frameworks: MVC, Struts, Hibernate, Spring and MRUnit
Languages: C,C++, Java, Python, Linux shell scripts, SQL
Databases: Cassandra, MongoDB, HBase, Teradata, Oracle, MySQL, DB2
Web Servers: JBoss, Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, CSS, AJAX, JSON, Servlets,JSP
Reporting Tools: Jasper Reports, iReports
ETL Tools: Informatica, Pentaho
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, San Jose, CA
Responsibilities:
- Developed Java code that stream the web log data into Hive using Rest services.
- Worked on migrating data from MongoDB to Hadoop.
- Worked on integrating SFDC with Hadoop.
- Developed Java code that can stream Salesforce data into hive using StreamingAPI.
- Executed Hive queries on tables stored in Hive to perform data analysis to meet the business requirements.
- Worked on Configuring Zookeeper, Kafka cluster.
- Worked on Creating Kafka topics, partitions, writing custom partitioner classes.
- Worked on Big Data Integration and Analytics based on Hadoop, Spark and Kafka.
- Developed Spark code using Python and Spark-SQL/Streaming for faster processing of data.
- Real time streaming the data using Spark with Kafka.
- Built real time pipeline for streaming data using Kafka and SparkStreaming.
- Installation & configuration of a Hadoop cluster using Ambarialong with Hive.
- Processing large data sets in parallel across the Hadoop cluster for pre-processing.
- Developed the code for Importing and exporting data into HDFS using Sqoop and Flume.
- Written shell scripts that run multiple Hive jobs which helps to automate different hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Moving log data from Logstash server into Hadoop using flume.
- Written Java scripts that execute different MongoDB queries.
Environment: s: Hadoop, Hive, Flume, Linux, Shell Scripting, Java, Eclipse, MongoDB, Kafka, Spark, Zookeeper, Sqoop, Ambari.
Hadoop Developer
Confidential, El Segundo, CA
Responsibilities:
- Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
- Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
- Imported data from structured data source into HDFS using Sqoop incremental imports.
- Implemented Kafka Custom partitioners to send data to different categorized topics.
- Implemented Storm topology with Streaming group to perform real time analytical operations.
- Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
- Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
- Created Hive Generic UDF's to process business logic with Hive QL.
- Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
- Used Cassandra Query Language (CQL) to perform analytics on time series data.
- Moving data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
- Responsible for running Hadoop streaming jobs to process terabytes of XML Data.
- Development of Oozie workflow for orchestrating and scheduling the ETL process.
- Involved in implementation of Avro,ORC, and Parquet data formats for Apache Hive computations to handle the custom business requirements.
- Write Unix shell scripts in combination with the Talend data maps to process the source files and load into staging database
- Involved in creation of virtual machines and infrastructure in the Azure Cloud environment.
- Involved in developing Azure Web role and Worker roles.
- Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per user using MapReduce and save output in Hive table.
- Used Talend Studio 6.2 to re-write the SSIS ETL packages.
- Experience in implementing Kafka consumers and producers by extending Kafkahigh-level API in java and ingesting data to HDFS or Hbase depending on the context.
- Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
- DevelopedSQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Python.
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Responsible for maintaining and expanding AWS (Cloud Services) infrastructure using AWS (SNS, SQS)
- Developed Spark scripts by using Python Shell commands as per the requirement.
- Experience implementing machine learning techniques in spark by using spark Mlib.
- Involved in moving data from Hive tables into Cassandra for real time analytics on hive tables.
- Involved in using Hadoop bench marks in monitoring, testing Hadoop cluster.
- Involved in implementing test cases, testing map reduce programs using MRUnit and other mocking frame works.
- Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
- Involved in implementing Maven build scripts, to work on maven projects and integrated with Jenkins.
Environment: s: Hadoop, Cloudera, Map Reduce, Hive, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle.
Hadoop Developer
Confidential, Fayetteville, NY
Responsibilities:
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Developed algorithms for identifying influencers with in specified social network channels.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Analyzing data with Hive, Pig and Hadoop Streaming.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Experienced in working with Apache Storm.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Performed data mining investigations to find new insights related to customers.
- Involved in forecast based on the present results and insights derived from data analysis.
- Developed sentiment analysis system per particular domain using machine learning concepts by using supervised learning methodology.
- Involved in collecting the data and identifying data patterns to build trained model using Machine Learning.
- Configured Hadoop environment with Kerberos authentication, Name nodes, and Data nodes.
- Designed Sources to Targets mappings from SQL Server, Excel/Flat files to Oracle using Informatica Power Center.
- Created Data Marts and loaded the data using Informatica Tool.
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
- Involved in identification of topics and trends and building context around that brand.
- Developed different formulas for calculating engagement on social media posts.
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
- Involved in review technical documentation and provide feedback.
- Involved in fixing issues arising out of duration testing.
Environment: Java, NLP, HBase, Machine Learning, Hadoop, HDFS, Map Reduce, Hortonworks, Hive, Apache Storm, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, MySQL, and eclipse
Sr. Java / Hadoop Developer
Confidential, Houston, TX
Responsibilities:
- Developed high-level design documents, Use case documents, detailed design documents and Unit Test Plan documents and created Use Cases, Class Diagrams and Sequence Diagrams using UML.
- Extensive involvement in database design, development, coding of stored Procedures, DDL&DML statements, functions and triggers.
- Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
- Developed portlet kind of user experience using Ajax, jQuery.
- Used spring IOC for creating the beans to be injected at the run time.
- Modified the existing JSP pages using JSTL.
- Used spring tool suite (STS) as the ide for the development.
- Used jQuery script for client side JavaScript methods.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Built a custom cross-platform architecture using Java, Spring Core/MVC, Hibernate through EclipseIDE
- Involved in writing PL/SQL for the stored procedures.
- Designed UI screens using JSP, Struts tags, HTML, jQuery. Used JavaScript for client side validation.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Responsible to manage data coming from different sources.
- Worked with business teams and created Hive queries for ad hoc access.
- Loaded daily data from websites to Hadoop cluster by using Flume.
- Created complex Hive tables and executed complex Hive queries on Hive warehouse.
- Wrote MapReduce code to convert unstructured data to semi structured data.
- Used Pig to extract, transformation & load of semi structured data.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Creating Hive tables and working on them using Hive QL.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Design and implement MapReduce jobs to support distributed data processing.
- Supported MapReduce Programs those are running on the cluster.
- Involved in HDFS maintenance and loading of structured and unstructured data. .
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, HBase, Java, Cloudera Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, Cassandra.
Java developer
Confidential
Responsibilities:
- Developed UI using HTML, CSS, Java Script and AJAX.
- Used Oracle IDE to create web services for EI application using top down approach.
- Worked on creating basic framework for spring and web services enabled environment for EI applications as web service provider.
- Created SOAP Handler to enable authentication and audit logging during Web Service calls.
- Created Service Layer API's and Domain objects using Struts.
- Designed, developed and configured the applications using Struts Framework.
- Created Spring DAO classes to call the database through spring -JPA ORM framework.
- Wrote PL/SQL queries and created stored procedures and invoke stored procedures using spring JDBC.
- Used Exception handling and Multi-threading for the optimum performance of the application.
- Used the Core Java concepts to implement the Business Logic.
- Created High level Design Document for Web Services and EI common framework and participated in review discussion meeting with client.
- Deployed and configured the data source for database in WebLogic application server and utilized log4j for tracking errors and debugging, maintain the source code using Subversion.
- Used Clear Case tool for build management and ANT for Application configuration and Integration.
- Created, executed, and documented, the tests necessary to ensure that an application and/or environment meet performance requirements (Technical, Functional and User Interface)
Environment: Windows, Linux, Rational Clear Case, Java, JAX-WS, SOAP, WSDL, JSP, Java Script, Ajax, Oracle IDE, log4j, ANT, struts, JPA, XML, HTML5, CSS3, Oracle WebLogic.
Software Developer - Intern
Confidential
Responsibilities:
- Worked as a Development Team Member.
- Coordinated with Business Analysts to gather the requirement and prepare data flow diagrams and technical documents.
- Identified Use Cases and generated Class, Sequence and State diagrams using UML.
- Used JMS for the asynchronous exchange of critical business data and events among J2EE components and legacy system.
- Involved in Designing, coding and maintaining of Entity Beans and Session Beans using EJB 2.1 Specification.
- Involved in the development of Web Interface using MVC Struts Framework.
- User Interface was developed using JSP and tags, CSS, HTML and Java Script.
- Database connection was made using properties files.
- Used Session Filter for implementing timeout for ideal users.
- Used stored Procedure to interact with database.
- Development of Persistence was done using DAO and Hibernate Framework.
Environment: J2EE, Struts1.0, Java Script, Swing, CSS, HTML, XML, XSLT, DTD, JUnit, EJB 2.1, Oracle, Tomcat, Eclipse, Web logic 7.0/8.1.