Sr.hadoop/spark Developer Resume
Sterling, VA
SUMMARY:
- Over 6+ years of professional IT experience which includes experience in Big data ecosystem and Java/J2EE related technologies.
- Excellent Experience in Hadoop architecture and various components such as HDFS Job Tracker Task Tracker Name Node Data Node and Map Reduce programming paradigm.
- Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop Map Reduce HDFS HBase Hive Sqoop Pig Zookeeper and Flume.
- Good Exposure on Apache Hadoop Map Reduce programming PIG Scripting and Distribute Application and HDFS.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- In - depth understanding of Data Structure and Algorithms.
- Experience in managing and reviewing Hadoop log files.
- Strong backend experience using; Python, Scala, HiveQL, SparkSQL, etc
- Excellent understanding and knowledge of NOSQL databases like MongoDB HBase Cassandra.
- Implemented in setting up standards and processes for Hadoop based application design and implementation.
- Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in Object Oriented Analysis Design OOAD and development of software using UML Methodology good knowledge of J2EE design patterns and Core Java design patterns.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Very good experience in complete project life cycle design development testing and implementation of Client Server and Web applications.
- Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Redhat.
- Extensive experience working in Oracle DB2 SQL Server and My SQL database.
- Good hold on scripting including Shell/Perl and Python
- Scripting to deploy monitors checks and critical system admin functions automation.
- Hands on experience in application development using Java RDBMS and Linux shell scripting.
- Experience in Java JSP Servlets EJB WebLogic WebSphere Hibernate Spring JBoss JDBC RMI Java Script Ajax JQuery XML and HTML
- Ability to adapt to evolving technology strong sense of responsibility and accomplishment.
SKILL SET:
Programming Languages: Scala, Python, Java
Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper
NoSQL Technologies: Cassandra, MongoDB, HBase
Big data Distribution: Hortonworks, Cloudera, Amazon EMR cloud
JAVA/J2EE Technologies: Servlets, JSP, JDBC, EJB, JAXB, JMS, JAX-RPC, JAX- WS, JAX-RS, Apache CFX .
Frameworks: Struts, Spring, Hibernate, iBatis.
Web Technologies: HTML, CSS, JavaScript, jQuery, AngularJS, Ajax, Backbone.js, React, Node.js, Ext JS, Bootstrap.
Development Tools: Eclipse, Net Beans, IBM RAD, IntelliJ, Spring tool Suite .
Databases: MySQL, MS-SQL Server, IBM DB2, Oracle.
Operating Systems: Windows XP/Vista/7/8,10, UNIX, Linux, Mac OS.
Build Tools: Ant, Gradle, Maven, npm, Bower.
Web/ Application Servers: WebSphere, Apache Tomcat, WebLogic, JBoss.
PROFESSIONAL EXPERIENCE:
Confidential, Sterling, VA
Sr.Hadoop/Spark Developer
Responsibilities:
- Developed simple to complex Map Reduce streaming jobs using Java language for processing and validating the data
- Developed data pipeline using Map Reduce, Flume, Sqoop to ingest customer behavioral data into HDFS for analysis
- Migrated Map Reduce jobs to Spark jobs to discover trends in data usage by users
- Implemented Spark using scala and Spark SQL for faster processing of data
- Implemented algorithms for real time analysis in Spark
- Imported data from AWS S3 in to Spark data frames, Performed transformations and actions on data frames
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
- Real time streaming the data using Kafka with Spark
- Used the Spark - Cassandra Connector to load data to and from Cassandra
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL)
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting and contributed for performance tuning using Hive
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data
- Created HBase tables and column families to store the user event data
- Written automated HBase test cases for data quality checks using HBase command line tools
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs
- Used Tez framework for building high performance jobs in Pig and Hive
- Configured Kafka to read and write messages from external programs
- Configured Kafka to handle real time data
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Developed interactive shell scripts for scheduling various data cleansing and data loading process
Environment: Hadoop, Spark, Map Reduce, Pig, Hive, Sqoop, Oozie, HBase, Zoo keeper, Kafka, Flume, Cloudera manager, AWS S3, MySQL, Cassandra, Multi-node cluster with Linux-Ubuntu, Windows, Unix.
Confidential, Westerville, OH
Hadoop Developer/Admin
Responsibilities:
- Understanding business needs, analyzed functional specifications and mapped those in designing end to end data transformation pipelines.
- Created Hive Tables, loaded data from Teradata using Sqoop.
- Performed importing and exporting data into HDFS from Relational Databases and vice versa using Sqoop.
- Extensively worked on importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Implemented Hive Generic UDF’s to in corporate business logic into Hive Queries.
- Extensively worked on HiveQL, join operations, writing custom UDF’s and having good experience in optimizing Hive Queries.
- WroteHive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data
- Developed MR jobs for cleaning, validating and transforming the data.
- Performed debugging, performance tuning using PIG and HIVE scripts by understanding the joins, group and aggregation between them.
- Wrote Pig scripts to transform raw data from several data sources.
- Used different columnar file formats (RCFile, Parquet and ORC formats).
- Used Cloudera manager to monitor workload, job performance and for capacity planning.
- Took part in build applications using Maven and integrated with Continuous Integration servers like Jenkins to build jobs.
- Performed data migration from Legacy Databases RDBMS to HDFS using Sqoop.
- Hands on experience on whole ETL (Extract Transformation & Load) process.
- ETL development to normalize this data and publish it in IMPALA
- Worked along with BI teams in generating the reports and designing ETL workflows on Tableau.
- Worked on NOSQL databases(HBase, MongoDB) for Hybrid implementations.
- Used IMPALA to analyze data ingested into HBase and compute various metrics for reporting on the dashboard.
- Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
- Worked with the testing teams to fix bugs and ensure smooth and error-free code.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: Hadoop, MapReduce, HDFS, Hive, Python, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, Zookeeper, MongoDB, PL/SQL, MySQL, DB2, Teradata.
Confidential, Northlake, IL
Hadoop Developer/Admin
Responsibilities:
- Worked on analyzing, writing Hadoop MapReduce jobs using API, Pig and Hive.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Involved in installing Hadoop Ecosystem components under Cloudera distribution.
- Responsible to manage data coming from different sources.
- Supported MapReduce Programs those are running on the cluster.
- Wrote MapReduce job using Java API for data Analysis and dim fact generations.
- Installed and configured Pig and also written Pig Latin scripts.
- Wrote MapReduce job using Pig Latin.
- Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Wrote Hive queries for data analysis to meet the business requirements.
- Created Hive tables and working on them using Hive QL.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Java, MapReduce, Spark, HDFS, Hive, Pig, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, PL/SQL, SQL connector, Sub Version.
Confidential, Durham, NC
Java/Hadoop Developer
Responsibilities:
- Installed and configured Hadoop HDFS, Map Reduce, Pig, Hive, and Sqoop.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Developing PIG scripts to transform the raw data into intelligent data as specified by business users.
- Demonstrate proficiency in Shell, Python scripts for file validation and processing, job scheduling, distribution and automation
- Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
- Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
- Develop Spark apps in Java, Scala or Python
- Exported analyzed data to HDFS using Sqoop for generating reports.
- Importing and exporting data into HDFS and Hive using Sqoop and Flume.
- Worked on Oozie workflow engine to run multiple Map Reduce jobs.
- Supported MapReduce Programs those are running on the cluster.
- Experienced in working with applications team in installing Hadoop updates, upgrades based on requirement.
Environment: Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, Java, Oracle, Eclipse and Shell/Python Scripting.
Confidential
Java Developer
Responsibilities:
- Review the requirement and analyze the impact.
- Participated in the requirement analysis and design of the application using UML/Rational Rose and Agile methodology.
- Involved in developed the application using Core Java, J2EE and JSP's.
- Worked to develop this Web based application in J2EE framework which uses Hibernate for persistence, spring for Dependency Injection and Junit for testing.
- Used JSP to develop the front-end screens of the application.
- Designed and developed several SQL Scripts, Stored Procedures, Packages and Triggers for the Database.
- Used Indexing techniques in the database procedures to obtain search results.
- Involved in development of Web Service client to get client details from third party agencies.
- Developed nightly batch jobs which involved interfacing with external thirdparty state agencies.
- Test scripts for performance and accessibility testing of the application are developed.
- Involved in different types of testing like Unit, System, Integration testing etc. is carried out during the testing phase.
- Provided production support to maintain the application.
Environment: Java, J2EE, Struts Frame work, JSP, Spring Framework, Hibernate, Oracle, Eclipse, Subversion, Oracle, PL/SQL, Websphere UML, Windows.