Hadoop Developer Resume San Jose, CA - Hire IT People

PROFESSIONAL SUMMARY:

8+ years of IT experience in complete life cycle of software development using Object Oriented analysis and design using Big data Technologies / Hadoop ecosystem, SQL, Java, J2EE technologies.
Around 5years of experience working on Big Data and Data Science building Advanced Customer Insight and Product Analytic Platforms using Big Data and Open Source Technologies.
Wide experience on Data Mining, Real time Analytics, Business Intelligence, Machine Learning and Web Development.
Leveraged strong Skills in developing applications involving Big Data technologies likeHadoop, Spark, ElasticSearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro and Scala.
Skilled programming in Map - Reduce framework and Hadoop ecosystems.
Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
Experience in implementingInverted Indexing algorithm using MapReduce.
Extensive experiencein creating Hive tables, loading them with data and writing hive queries which will run internally in MapReduce way.
Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations.
Experience in setting up standards and processes for Hadoop based application design and implementation.
Good Exposure on Apache Hadoop MapReduce programming, PIG Scripting and HDFS4.
Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.
Experience in writing Pig UDF’s (Eval, Filter, Load and Store) and macros.
Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
Very good understanding on NOSQL databases like MongoDB, Cassandra and HBase.
Experience in coordinating Cluster services through ZooKeeper.
Hands on experience in setting up Apache Hadoop, MapR and Hortonworks Clusters.
Good knowledge on Apache Hadoop Cluster planning which includes choosing the Hardware and operating systems to host an Apache Hadoop cluster.
Experience in Hadoop Distributions like Cloudera, HortonWorks, BigInsights, MapR Windows Azure, and Impala.
Experience using integrated development environment like Eclipse, Net beans, JDeveloper, MyEclipse.
Excellent understanding of relational databases as pertains to application development using several RDBMS including in IBM DB2, Oracle 10g, MS SQL Server 2005/2008, and MySQL and strong database skills including SQL, Stored Procedure and PL/SQL.
Working knowledge on J2EE development with Spring, Struts, Hibernate Frameworks in various projects and expertise in Web Services (JAXB, SOAP, WSDL, Restful) development
Experience in writing tests using Spec2, Scala Test, Selenium, TestNg and Junit.
Ability to work on diverse Application Servers like JBOSS, APACHE TOMCAT, WEBSPHERE.
Worked on different OS like UNIX/Linux, Windows XP, and Windows
A passion to learn new things (new Languages or new Implementations) have made me up to date with the latest trends and industry standard.
Proficient in adapting to the new Work Environment and Technologies.
Quick learner and self-motivated team player with excellent interpersonal skills.
Well focused and can meet the expected deadlines on target.
Good understanding of Scrum methodologies, Test Driven Development and continuous integration.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Avro, Hadoop Streaming, Cassandra, Oozie, Zookeeper, Spark, Strom, Kafka

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

IDE s: Eclipse, Net beans, WSAD, Oracle SQL Developer

Big data Analytics: Datameer 2.0.5

Frameworks: MVC, Struts, Hibernate, Spring and MRUnit

Languages: C,C++, Java, Python, Linux shell scripts, SQL

Databases: Cassandra, MongoDB, HBase, Teradata, Oracle, MySQL, DB2

Web Servers: JBoss, Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, CSS, AJAX, JSON, Servlets,JSP

Reporting Tools: Jasper Reports, iReports

ETL Tools: Informatica, Pentaho

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, San Jose, CA

Responsibilities:

Developed Java code that stream the web log data into Hive using Rest services.
Worked on migrating data from MongoDB to Hadoop.
Worked on integrating SFDC with Hadoop.
Developed Java code that can stream Salesforce data into hive using StreamingAPI.
Executed Hive queries on tables stored in Hive to perform data analysis to meet the business requirements.
Worked on Configuring Zookeeper, Kafka cluster.
Worked on Creating Kafka topics, partitions, writing custom partitioner classes.
Worked on Big Data Integration and Analytics based on Hadoop, Spark and Kafka.
Developed Spark code using Python and Spark-SQL/Streaming for faster processing of data.
Real time streaming the data using Spark with Kafka.
Built real time pipeline for streaming data using Kafka and SparkStreaming.
Installation & configuration of a Hadoop cluster using Ambarialong with Hive.
Processing large data sets in parallel across the Hadoop cluster for pre-processing.
Developed the code for Importing and exporting data into HDFS using Sqoop and Flume.
Written shell scripts that run multiple Hive jobs which helps to automate different hive tables incrementally which are used to generate different reports using Tableau for the Business use.
Moving log data from Logstash server into Hadoop using flume.
Written Java scripts that execute different MongoDB queries.

Environment: s: Hadoop, Hive, Flume, Linux, Shell Scripting, Java, Eclipse, MongoDB, Kafka, Spark, Zookeeper, Sqoop, Ambari.

Hadoop Developer

Confidential, El Segundo, CA

Responsibilities:

Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
Imported data from structured data source into HDFS using Sqoop incremental imports.
Implemented Kafka Custom partitioners to send data to different categorized topics.
Implemented Storm topology with Streaming group to perform real time analytical operations.
Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
Created Hive Generic UDF's to process business logic with Hive QL.
Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
Used Cassandra Query Language (CQL) to perform analytics on time series data.
Moving data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
Responsible for running Hadoop streaming jobs to process terabytes of XML Data.
Development of Oozie workflow for orchestrating and scheduling the ETL process.
Involved in implementation of Avro,ORC, and Parquet data formats for Apache Hive computations to handle the custom business requirements.
Write Unix shell scripts in combination with the Talend data maps to process the source files and load into staging database
Involved in creation of virtual machines and infrastructure in the Azure Cloud environment.
Involved in developing Azure Web role and Worker roles.
Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per user using MapReduce and save output in Hive table.
Used Talend Studio 6.2 to re-write the SSIS ETL packages.
Experience in implementing Kafka consumers and producers by extending Kafkahigh-level API in java and ingesting data to HDFS or Hbase depending on the context.
Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
DevelopedSQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Python.
Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
Responsible for maintaining and expanding AWS (Cloud Services) infrastructure using AWS (SNS, SQS)
Developed Spark scripts by using Python Shell commands as per the requirement.
Experience implementing machine learning techniques in spark by using spark Mlib.
Involved in moving data from Hive tables into Cassandra for real time analytics on hive tables.
Involved in using Hadoop bench marks in monitoring, testing Hadoop cluster.
Involved in implementing test cases, testing map reduce programs using MRUnit and other mocking frame works.
Involved in cluster maintenance which includes adding, removing cluster nodes, cluster monitoring and troubleshooting, reviewing and managing data backups and Hadoop log files.
Involved in implementing Maven build scripts, to work on maven projects and integrated with Jenkins.

Environment: s: Hadoop, Cloudera, Map Reduce, Hive, Spark, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Cassandra, Git, XML, Scala, Java, Maven, Eclipse, Oracle.

Hadoop Developer

Confidential, Fayetteville, NY

Responsibilities:

Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Developed MapReduce programs to parse the raw data and store the refined data in tables.
Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Developed algorithms for identifying influencers with in specified social network channels.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Analyzing data with Hive, Pig and Hadoop Streaming.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Created Hive tables, loaded data and wrote Hive queries that run within the map.
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
Experienced in working with Apache Storm.
Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
Performed data mining investigations to find new insights related to customers.
Involved in forecast based on the present results and insights derived from data analysis.
Developed sentiment analysis system per particular domain using machine learning concepts by using supervised learning methodology.
Involved in collecting the data and identifying data patterns to build trained model using Machine Learning.
Configured Hadoop environment with Kerberos authentication, Name nodes, and Data nodes.
Designed Sources to Targets mappings from SQL Server, Excel/Flat files to Oracle using Informatica Power Center.
Created Data Marts and loaded the data using Informatica Tool.
Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
Involved in identification of topics and trends and building context around that brand.
Developed different formulas for calculating engagement on social media posts.
Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
Involved in review technical documentation and provide feedback.
Involved in fixing issues arising out of duration testing.

Environment: Java, NLP, HBase, Machine Learning, Hadoop, HDFS, Map Reduce, Hortonworks, Hive, Apache Storm, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, MySQL, and eclipse

Sr. Java / Hadoop Developer

Confidential, Houston, TX

Responsibilities:

Developed high-level design documents, Use case documents, detailed design documents and Unit Test Plan documents and created Use Cases, Class Diagrams and Sequence Diagrams using UML.
Extensive involvement in database design, development, coding of stored Procedures, DDL&DML statements, functions and triggers.
Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
Developed portlet kind of user experience using Ajax, jQuery.
Used spring IOC for creating the beans to be injected at the run time.
Modified the existing JSP pages using JSTL.
Used spring tool suite (STS) as the ide for the development.
Used jQuery script for client side JavaScript methods.
Developed the Pig UDF'S to pre-process the data for analysis.
Built a custom cross-platform architecture using Java, Spring Core/MVC, Hibernate through EclipseIDE
Involved in writing PL/SQL for the stored procedures.
Designed UI screens using JSP, Struts tags, HTML, jQuery. Used JavaScript for client side validation.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Responsible to manage data coming from different sources.
Worked with business teams and created Hive queries for ad hoc access.
Loaded daily data from websites to Hadoop cluster by using Flume.
Created complex Hive tables and executed complex Hive queries on Hive warehouse.
Wrote MapReduce code to convert unstructured data to semi structured data.
Used Pig to extract, transformation & load of semi structured data.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Creating Hive tables and working on them using Hive QL.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Design and implement MapReduce jobs to support distributed data processing.
Supported MapReduce Programs those are running on the cluster.
Involved in HDFS maintenance and loading of structured and unstructured data. .

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, HBase, Java, Cloudera Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, Cassandra.

Java developer

Confidential

Responsibilities:

Developed UI using HTML, CSS, Java Script and AJAX.
Used Oracle IDE to create web services for EI application using top down approach.
Worked on creating basic framework for spring and web services enabled environment for EI applications as web service provider.
Created SOAP Handler to enable authentication and audit logging during Web Service calls.
Created Service Layer API's and Domain objects using Struts.
Designed, developed and configured the applications using Struts Framework.
Created Spring DAO classes to call the database through spring -JPA ORM framework.
Wrote PL/SQL queries and created stored procedures and invoke stored procedures using spring JDBC.
Used Exception handling and Multi-threading for the optimum performance of the application.
Used the Core Java concepts to implement the Business Logic.
Created High level Design Document for Web Services and EI common framework and participated in review discussion meeting with client.
Deployed and configured the data source for database in WebLogic application server and utilized log4j for tracking errors and debugging, maintain the source code using Subversion.
Used Clear Case tool for build management and ANT for Application configuration and Integration.
Created, executed, and documented, the tests necessary to ensure that an application and/or environment meet performance requirements (Technical, Functional and User Interface)

Environment: Windows, Linux, Rational Clear Case, Java, JAX-WS, SOAP, WSDL, JSP, Java Script, Ajax, Oracle IDE, log4j, ANT, struts, JPA, XML, HTML5, CSS3, Oracle WebLogic.

Software Developer - Intern

Confidential

Responsibilities:

Worked as a Development Team Member.
Coordinated with Business Analysts to gather the requirement and prepare data flow diagrams and technical documents.
Identified Use Cases and generated Class, Sequence and State diagrams using UML.
Used JMS for the asynchronous exchange of critical business data and events among J2EE components and legacy system.
Involved in Designing, coding and maintaining of Entity Beans and Session Beans using EJB 2.1 Specification.
Involved in the development of Web Interface using MVC Struts Framework.
User Interface was developed using JSP and tags, CSS, HTML and Java Script.
Database connection was made using properties files.
Used Session Filter for implementing timeout for ideal users.
Used stored Procedure to interact with database.
Development of Persistence was done using DAO and Hibernate Framework.

Environment: J2EE, Struts1.0, Java Script, Swing, CSS, HTML, XML, XSLT, DTD, JUnit, EJB 2.1, Oracle, Tomcat, Eclipse, Web logic 7.0/8.1.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

San Jose, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship