Hadoop/spark Developer Resume
Austin, TX
PROFESSIONAL SUMMARY:
- 8+ years of IT experience in complete life cycle of software development using Object Oriented analysis and design using Big data Technologies / Hadoop ecosystem, SQL, Java, J2EE technologies.
- Around 5years of experience working on Big Data and Data Science building Advanced Customer Insight and Product Analytic Platforms using Big Data and Open Source Technologies.
- Wide experience on Data Mining, Real time Analytics, Business Intelligence, Machine Learning and Web Development.
- Leveraged strong Skills in developing applications involving Big Data technologies likeHadoop, Spark,ElasticSearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro and Scala.
- Skilled programming in Map - Reduce framework and Hadoop ecosystems.
- Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Experience in implementingInverted Indexing algorithm using MapReduce.
- Extensive experiencein creating Hive tables, loading them with data and writing hive queries which will run internally in MapReduce way.
- Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations.
- Experience in setting up standards and processes for Hadoop based application design and implementation.
- Good Exposure on Apache Hadoop MapReduce programming, PIG Scripting and HDFS4.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.
- Experience in writing Pig UDF’s (Eval, Filter, Load and Store) and macros.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Very good understanding on NOSQL databases like MongoDB, Cassandra and HBase.
- Experience in coordinating Cluster services through ZooKeeper.
- Hands on experience in setting up Apache Hadoop, MapR and Hortonworks Clusters.
- Good knowledge on Apache Hadoop Cluster planning which includes choosing the Hardware and operating systems to host an Apache Hadoop cluster.
- Experience in Hadoop Distributions like Cloudera, HortonWorks, BigInsights, MapR Windows Azure, and Impala.
- Exposure toMesos, Marathon and Zookeeper cluster environment for application deployments and Docker containers.
- Excellent understanding of relational databases as pertains to application development using several RDBMS including in IBM DB2, Oracle 10g, MS SQL Server 2005/2008, and MySQL and strong database skills including SQL, Stored Procedure and PL/SQL.
- Working knowledge on J2EE development with Spring, Struts, Hibernate Frameworks in various projects and expertise in Web Services (JAXB, SOAP, WSDL, Restful) development
- Experience in writing tests using Spec2, Scala Test, Selenium, TestNg andJunit.
- Ability to work on diverse Application Servers like JBOSS, APACHE TOMCAT, WEBSPHERE.
- Worked on different OS like UNIX/Linux, Windows XP, and Windows
- A passion to learn new things (new Languages or new Implementations) have made me up to date with the latest trends and industry standard.
- Proficient in adapting to the new Work Environment and Technologies.
- Quick learner and self-motivated team player with excellent interpersonal skills.
- Well focused and can meet the expected deadlines on target.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Avro, Hadoop Streaming, Cassandra, Oozie, Zookeeper, Spark, Strom, Kafka
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE s: Eclipse, Net beans, WSAD, Oracle SQL Developer
Big data Analytics: Datameer 2.0.5
Frameworks: MVC, Struts, Hibernate, Spring and MRUnit
Languages: C,C++, Java, Python, Linux shell scripts, SQL
Databases: Cassandra, MongoDB, HBase, Teradata, Oracle, MySQL, DB2
Web Servers: JBoss, Web Logic, Web Sphere, Apache Tomcat
Web Technologies: HTML, XML, JavaScript, CSS, AJAX, JSON, Servlets,JSP
Reporting Tools: Jasper Reports, iReports
ETL Tools: Informatica, Pentaho
PROFESSIONAL EXPERIENCE:
Confidential, Austin, TX
Hadoop/Spark Developer
Responsibilities:
- Integrating Spring and Hibernate into existing COMMS application.
- Worked with business teams and created Hive queries for ad hoc access.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Responsible for analyzing and cleansing raw data by performing Hive queries on data.
- Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Filtered the data using Spark/Scala and generated the reports using Apache Hive.
- Migrated HiveQL queries on structured into SparkSQL to improve performance.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time.
- Implemented Spark applications from existing MapReduce framework for better performance
- Implemented Kafka Custom partitioners to send data to different categorized topics.
- Developed Kafka consumer API in Scala for consuming data from Kafka topics.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Modified Maven Scripts to build the JAR files, WARfiles and EAR files.
- Used Jira for bug tracking and BitBucket to check-in and checkout code changes.
- Worked with different teams to ensure data quality and availability.
- Responsible for generating actionable insights from complex data to drive real business results for various applications teams and worked in Agile Methodology projects extensively.
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
Environment: s: Linux, Java, Eclipse, Jscape, Spring, Hibernate, Junit, BitBucket, Jenkins, MySQL, Tomcat, IBM SFG, Spark, Scala, Hive, and Kafka.
Confidential, San Jose, CA
Hadoop Developer
Responsibilities:
- Developed Java code that stream the JSON data into Hive using REST services.
- Worked on migrating data from MongoDB to Hadoop.
- Worked on integrating SFDC with Hadoop.
- Developed Java code that can stream Salesforce data into hive using StreamingAPI.
- Executed Hive queries on tables stored in Hive to perform data analysis to meet the business requirements.
- Worked on Configuring Zookeeper,Kafkacluster.
- Worked on CreatingKafkatopics, partitions, writing custom partitioner classes.
- Worked on Big Data Integration and Analytics based on Hadoop, Spark andKafka.
- Developed Spark code using Python and SparkSQL/SparkStreaming for faster processing of data.
- Real time streaming the data using Spark withKafka.
- Involved in loading data to Kafka Producers from rest endpoints and transferring the data to Kafka Brokers.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka.
- Developed analytical components using Spark, Apache Mesos and Spark Stream.
- Built real time pipeline for streaming data usingKafkaand SparkStreaming.
- Installation & configuration of a Hadoop cluster using Ambarialong with Hive.
- Processing large data sets in parallel across the Hadoop cluster for pre-processing.
- Developed the code for Importing and exporting data into HDFS using Sqoop and Flume.
- Worked on loading CSV/TXT/AVRO/PARQUET files using Python/Java language in Spark Framework and process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS to load into fact table using ORC Reader.
- Written shell scripts that run multiple Hive jobs which helps to automate different hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Moving log data from Logstash server into Hadoop using flume.
- Written Java scripts that execute different MongoDB queries.
Environment: s: Hadoop, Hive, Flume, Linux, Shell Scripting, Java, REST, Eclipse, MongoDB, Kafka, Spark, Python, Zookeeper, Sqoop, Ambari.
Confidential, El Segundo, CA
Hadoop Developer
Responsibilities:
- Developed Map Reduce jobs in Java for data cleansing, preprocessing and implemented complex data analytical algorithms.
- Developed Map Reduce programs to join data from different data sources using optimized joins by implementing bucketed joins or map joins depending on the requirement.
- Imported data from structured data source into HDFS using Sqoop incremental imports.
- Implemented Kafka Custom partitioners to send data to different categorized topics.
- Implemented Storm topology with Streaming group to perform real time analytical operations.
- Experience in implementing Kafka Spouts for streaming data and different bolts to consume data.
- Created Hive tables, partitioners and implemented incremental imports to perform ad-hoc queries on structured data.
- Created Hive Generic UDF's to process business logic with Hive QL.
- Involved in optimizing Hive queries, improve performance by configuring Hive Query parameters.
- Used Cassandra Query Language (CQL) to perform analytics on time series data.
- Moving data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
- Responsible for running Hadoop streaming jobs to process terabytes of XML Data.
- Development of Oozie workflow for orchestrating and scheduling the ETL process.
- Involved in implementation of Avro,ORC, and Parquet data formats for Apache Hive computations to handle the custom business requirements.
- Write Unix shell scripts in combination with theTalenddata maps to process the source files and load into staging database
- Written the SQL stored procedure in Hue to access the data from Impala.
- Evaluated usage of command line Hue for Workflow Orchestration.
- Involved in creation of virtual machines and infrastructure in theAzureCloud environment.
- Involved in developingAzureWeb role and Worker roles.
- Worked in retrieving transaction data from RDBMS to HDFS, get total transacted amount per userusing MapReduce and save output in Hive table.
- UsedTalendStudio 6.2 to re-write the SSIS ETL packages.
- Involved in verifying cleaned data using Talend tool with other department.
- Involved in automation of FTP process inTalendand FTPing the Files in UNIX.
- Experience in implementing Kafka consumers and producers by extending Kafkahigh-level API in java and ingesting data to HDFS or Hbase depending on the context.
- CreatedTalendjobs to copy the files from one server to another and utilizedTalendFTP components.
- Worked on creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
- DevelopedSQL scripts using Spark for handling different data sets and verifying the performance over Map Reduce jobs.
- Involved in convertingMapReduce programs into Spark transformations using Spark RDD's.
- Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
- Implemented monitoring on all the NiFi flows to get notifications if there is no data flowing through the flow more than the specific time.
- Created NiFi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Responsible for maintaining and expandingAWS(Cloud Services) infrastructure usingAWS(SNS, SQS)
- Involved in converting Hive/SQL queries into Spark transformations using Scala.
- Involved in moving data from Hive tables into Cassandra for real time analytics on hive tables.
Environment: s: Hadoop, Cloudera, Map Reduce, Hive, Spark, Scala, Avro, Kafka, Storm, Linux, Sqoop, Shell Scripting, Oozie, Cassandra, Git, XML, Scala, Java, REST, Maven, Eclipse, Oracle.
Confidential, Fayetteville, NY
Java / Hadoop Developer
Responsibilities:
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed MapReduce programs to parse the raw JSONdata and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Developed algorithms for identifying influencers with in specified social network channels.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Analyzing data with Hive, Pig and Hadoop Streaming.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Experienced in working with Apache Storm.
- Designed Cassandra data schema, implement real time data pipelines of Kafka messaging system and Flink streaming layer sink to Cassandra
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
- Performed data mining investigations to find new insights related to customers.
- Involved in forecast based on the present results and insights derived from data analysis.
- Developed sentiment analysis system per particular domain using machine learning concepts by using supervised learning methodology.
- Involved in collecting the data and identifying data patterns to build trained model using Machine Learning.
- Configured Hadoop environment withKerberosauthentication, Name nodes, and Data nodes.
- Designed Sources to Targets mappings from SQL Server, Excel/Flat files to Oracle usingInformatica Power Center.
- Created Data Marts and loaded the data usingInformaticaTool.
- Developed and generated insights based on brand conversations, which in turn helpful for effectively driving brand awareness, engagement and traffic to social media pages.
- Involved in identification of topics and trends and building context around that brand.
- Developed different formulas for calculating engagement on social media posts.
- Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.
- Involved in review technical documentation and provide feedback.
- Involved in fixing issues arising out of duration testing.
Environment: Java, NLP, HBase, Machine Learning, Hadoop, HDFS, Map Reduce, Hortonworks, Hive, Apache Storm, Sqoop, Flume, Oozie, Apache Kafka, Zookeeper, MySQL, and eclipse
Confidential
Java Developer
Responsibilities:
- Developed high-level design documents, usecase documents, detailed design documents and Unit Test Plan documents and created Use Cases, Class Diagrams and Sequence Diagrams using UML.
- Extensive involvement in database design, development, coding of stored Procedures, DDL&DML statements, functions and triggers.
- Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
- Developed portlet kind of user experience using Ajax, jQuery.
- Used spring IOC for creating the beans to be injected at the run time.
- Modified the existing JSP pages using JSTL.
- Used spring tool suite (STS) as the ide for the development.
- Used jQuery script for client side JavaScript methods.
- Developed the Pig UDF'S to pre-process the data for analysis.
- Built a custom cross-platform architecture using Java, Spring Core/MVC, Hibernate through EclipseIDE
- Involved in writing PL/SQL for the stored procedures.
- Designed UI screens using JSP, Struts tags, HTML, jQuery. Used JavaScript for client side validation.
- Involved in the development of Web Interface using MVC Struts Framework.
- User Interface was developed using JSP and tags, CSS, HTML and Java Script.
- Database connection was made using properties files.
- Used Session Filter for implementing timeout for ideal users.
- Used stored Procedure to interact with database.
Environment: Linux, MySQL, MySQL Workbench, Eclipse, J2EE, Struts1.0, Java Script, Swing, CSS, HTML, XML, XSLT, DTD, JUnit, EJB 2.1, Tomcat, Web logic 7.0/8.1
Confidential
Java developer
Responsibilities:
- Developed UI using HTML, CSS, Java Script and AJAX.
- Used Oracle IDE to create web services for EI application using top down approach.
- Worked on creating basic framework for spring and web services enabled environment for EI applications as web service provider.
- Created SOAP Handler to enable authentication and audit logging during Web Service calls.
- Created Service Layer API's and Domain objects using Struts.
- Designed, developed and configured the applications using Struts Framework.
- Created Spring DAO classes to call the database through spring -JPA ORM framework.
- Wrote PL/SQL queries and created stored procedures and invoke stored procedures using spring JDBC.
- Used Exception handling and Multi-threading for the optimum performance of the application.
- Used the Core Java concepts to implement the Business Logic.
- Created High level Design Document for Web Services and EI common framework and participated in review discussion meeting with client.
- Deployed and configured the data source for database in WebLogic application server and utilized log4j for tracking errors and debugging, maintain the source code using Subversion.
- Used Clear Case tool for build management and ANT for Application configuration and Integration.
- Created, executed, and documented, the tests necessary to ensure that an application and/or environment meet performance requirements (Technical, Functional and User Interface)
Environment: Windows, Linux, Rational Clear Case, Java, JAX-WS, SOAP, WSDL, JSP, Java Script, Ajax, Oracle IDE, log4j, ANT, struts, JPA, XML, HTML5, CSS3, Oracle WebLogic.
