Sr. Hadoop / Spark Developer Resume
Dallas, TX
SUMMARY:
- Having 8+ years of Experience in IT industry in Designing, Developing and Maintaining Web based Applications using Big Data Technologies like Hadoop and Spark Ecosystems and Java/J2EE Technologies.
- Excellent understanding of Hadoop Architecture and Daemons such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker and Map Reduce Concepts.
- Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop, HDFS, MapReduce Programming, Hive, Pig, Sqoop, HBase, Impala, Solr, Elastic Search, Oozie, Zoo Keeper, Kafka, Spark, Cassandra with Cloudera and Hortonworks distribution.
- Hands on experience in various big data application phases like data ingestion, data analytics and data visualization.
- Experienced in writing MapReduce programs in Java to process large data sets using Map and Reduce Tasks.
- In - depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Spark MLlib.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core .
- Expertise in developing Real-Time Streaming Solutions using Spark Streaming.
- Expertise in using Spark-SQL with various data sources like JSON, Parquet and Hive.
- Hands on experience in MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in Spark streaming.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka .
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Hands on Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Hands on Experience in working with Spark MLlib.
- Experienced in Developing Spark programs using Scala and Java API’s.
- Expertise in using Kafka as a messaging system to implement real-time Streaming solutions.
- Implemented Sqoop for large data transfers from RDMS to HDFS/HBase/Hive and vice-versa.
- Expertise in using Flume in Collecting, aggregating and loading log data from multiple sources into HDFS.
- Scheduled various ETL process and Hive scripts by developing Oozie workflows.
- Experienced in working with structured data using Hive QL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
- Experience in handling various file formats like AVRO, Sequential, Parquet etc.
- Proficient in Various NoSQL Databases like Cassandra, MongoDB, Hbase etc.
- Good understanding of MPP databases such Impala and Created tables and writing Queries in Impala and GreenPlum.
- Experienced in using Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Good Knowledge on Cloudera distributions and in AWS Amazon S3, EC2 and EMR.
- Worked on HBase to perform real time analytics and experienced in CQL to extract data from Cassandra tables.
- Experienced with Kerberos authentication to provide more security to the cluster.
- Experienced with Cloudera Manager to monitor health and performance of the Hadoop cluster.
- Experienced in writing Test cases and perform unit testing using testing frame works like Junit, Easy mock and Mockito.
- Hands on experience in scripting for automation, and monitoring using Shell, PHP, Python & Perl scripts.
- Strong Knowledge in Informatica Power center, Data warehousing and Business intelligence.
- Good level of experience in Core Java, JEE technologies as JDBC, Servlets and JSP.
- Expert in developing web applications using Struts, Hibernate and Spring Frameworks.
- Hands on Experience in writing SQL and PL/SQL queries.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, White-box, Black-box.
- Experience in Web Services using XML, HTML, and SOAP .
- Worked on version control tools like CVS, GIT, SVN .
- Well Experience in projects using JIRA, Testing, Maven, MS Build and Jenkins build tools.
- Experience in developing web pages using Java, JSP, Servlets, JavaScript, JQuery, Angular JS, Node, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, Junit, Apache-Tomcat and Web Sphere .
TECHNICAL SKILLS:
Big Data Ecosystem Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy. Hadoop Distributions Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache Languages Java, Python, Jruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++ No SQL Databases Cassandra, MongoDB and HBase Java Technologies Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts XML Technologies XML, XSD, DTD, JAXP (SAX, DOM), JAXB Methodology Agile, waterfall Web Design Tools HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON, NodeJs. Development / Build Tools Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J. Frameworks Struts, spring and Hibernate App/Web servers WebSphere, WebLogic, JBoss and Tomcat DB Languages MySQL, PL/SQL, PostgreSQL and Oracle RDBMS Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2 Operating systems UNIX, LINUX, Mac OS and Windows Variants ETL Tools Talend, Informatica, PentahoPROFESSIONAL EXPERIENCE:
Confidential, Dallas, TX
Sr. Hadoop / Spark Developer
Responsibilities:
- Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Cassandra, Oozie, Sqoop, Kafka, Spark, Impala with Cloudera distribution
- Developed Pig scripts to help perform analytics on JSON and XML data.
- Created Hive tables (external, internal) with static and dynamic partitions and performed bucketing on the tables to provide efficiency.
- Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Performed data transformations by writing MapReduce and Pig scripts as per business requirements.
- Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
- Extracting real time data using Kafka and spark streaming by Creating DStreams and converting them into RDD, processing it and stored it into Cassandra.
- Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Casandra tables for quick searching, sorting and grouping.
- Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster.
- Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Identifying the problem areas by using elastic search Kibanna with Logstash to import .csv files. Using Solr over Lucene index provided a full text search for analysis and quantification.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
- Work experience with cloud infrastructure like Amazon Web Services (AWS).
- Design and document REST/HTTP, SOAP APIs, including JSON data formats and API versioning strategy.
- Worked with SCRUM team in delivering agreed user stories on time for every sprint.
- Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
- Log4j framework has been used for logging debug, info & error data.
- Developed Spark applications using Scala and Spark-SQL for faster processing and testing.
- Developed customized UDF's in java to extend Hive and Pig functionality.
- Imported data from RDBMS systems like MySQL into HDFS using Sqoop.
- Developed Sqoop jobs to perform incremental imports into Hive tables.
- Implemented map-reduce counters to gather metrics of good records and bad records.
- Involved in loading and transforming of large sets of structured and semi structured data.
- Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
- I worked on a task to decrease database load for search and moved some part of search to Elastic Search search engine.
- Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
- Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
- Implemented Elastic Search on Hive data warehouse platform.
- Involved in analyzing log data to predict the errors by using Apache Spark.
- Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EMR, EBS, RDS and VPC.
- Integrated MapReduce with HBase to import bulk amount of data into HBase using MapReduce programs.
- Used Impala and Written Queries for fetching Data from Hive tables.
- Developed Several MapReduce jobs using Java API.
- Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark Streaming
- Well versed with the Database and Data Warehouse concepts like OLTP, OLAP, Star and Snow Flake Schema.
- Reading the log files using Elastic search Logstash and alerting users on the issue and also saving the alert details to MongoDB for analyzations.
- Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
- Near Real Time Solr index on Hbase and HDFS.
- Involved in analyzing log data to predict the errors by using Apache Spark.
- Developed Pig and Hive UDF's to implement business logic for processing the data as per requirements.
- Developed Oozie Bundles to schedule pig, Sqoop and hive jobs to create data pipelines.
- Implemented the project by using Agile Methodology and Attended Scrum Meetings daily.
Environment: Hadoop, Hive, HDFS, Pig, Sqoop, Oozie, Spark, Spark-Streaming, KAFKA, Apache Solr, Cassandra, Cloudera Distribution, Java, Impala, Web Server’s, Maven Build, MySQL, AWS, Agile-Scrum.
Confidential, New York City, NY
BigData Engineer
Responsibilities:- Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
- Implemented Custom interceptors to Mask confidential data and filter unwanted records from the event payload in flume.
- Implemented Custom Serializes to perform encryption using DES algorithm.
- Developed Collections in Mongo DB and performed aggregations on the collections.
- Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
- Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables.
- Developed Spark Programs using Scala and Java API's and performed transformations and actions on RDD's.
- Designed and implemented Spark jobs to support distributed data processing.
- Experienced in writing Spark Applications in Scala and Python (Pyspark).
- Used Apache Nifi for ingestion of data from the IBM MQ's (Messages Queue)
- Developed custom processors in java using maven to add the functionality in Apache Nifi for some additional tasks.
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Created HBase tables and used Hbase sinks and loaded data into them to perform analytics using Tableau.
- Created HBase tables and column families to store the user event data
- Imported data from AWS S3 and into spark RDD and performed transformations and actions on RDD's.
- Configured, monitored, and optimized Flume agent to capture web logs from the VPN server to be put into Hadoop Data Lake.
- Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
- Experienced on loading and transforming of large sets of structured, semi and unstructured data.
- Experience in working with Hadoop clusters using Hortonworks distributions.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
- Wrote, tested and implemented Teradata Fast load, Multiload and Bteq scripts, DML and DDL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Develop ETL Process using SPARK, SCALA, HIVE and HBASE .
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.
- Developed PIG Latin scripts for the analysis of semi structured data and conducted data Analysis by running Hive queries and Pig Scripts.
- Used codec's like snappy and LZO to store data into HDFS to improve performance.
- Expert knowledge on MongoDB NoSQL data modeling, tuning, disaster recovery and backup.
- Created HBase tables to store variable data formats of data coming from different Legacy systems.
- Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Developed Sqoop Jobs to load data from RDBMS into HDFS and HIVE.
- Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
- Involved in loading data from UNIX file system and FTP to HDFS
- Imported the data from different sources like AWS S3, LFS into Spark RDD.
- Worked on Kerberos authentication to establish a more secure network communication on the cluster.
- Performed troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Worked with Network, database, application and BI teams to ensure data quality and availability.
- Worked with ELASTIC MAPREDUCE and setup environment in AWS EC2 Instances.
- Experience in Maintaining the cluster on AWS EMR.
- Experienced in NOSQL databases like HBase, MongoDB and experienced with Hortonworks distribution of Hadoop.
- Developed ETL jobs to integrate data from various sources and load into the warehouse using Informatica 9.1
- Experienced in Creating ETL Mappings in Informatica.
- Experienced in working with various Transformations like Filter, Router, Expression, update strategy etc. in Informatica.
- Scheduled the ETL jobs using ESP scheduler.
- Worked in Agile methodology and actively participated in daily Scrum meetings.
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Hbase, MongoDB, Flume, Apache Spark, Accumulo, Oozie, Kerberos, AWS, Tableau, Java, Informatica, Elastic Search, Git, Maven.
Confidential, Fort Lauderdale, FL
Hadoop Developer
Responsibilities:- Handled large amount of data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Worked on Data importing and exporting into HDFS and Hive Using Sqoop.
- Developed Map Reduce jobs in Java to perform data cleansing and pre-processing.
- Migrated large amount of data from various Databases like Oracle, Netezza, MySQL to Hadoop.
- Responsible to Create Hive Tables, Load data into them and to write Hive queries.
- Performing Data transformations in HIVE.
- written Hive queries to perform Data Analysis as per the Business Requirements.
- Created partitions and buckets on hive tables to improve performance while running Hive queries.
- Optimizing and performance tuning of Hive Queries.
- Implementing Complex transformations by writing UDF's in PIG and HIVE.
- Loading and Transforming all kinds of data like Structured, semi-structured, and Unstructured data.
- Ingesting Log data from various web servers into HDFS using Apache Flume.
- Implemented Flume Agents for loading Streaming data into HDFS.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Written several Map reduce Jobs using Java API.
- Scheduled jobs using Oozie workflow Engine.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality
- Expertise in Extraction, Transformation, loading data from Oracle, DB2, SQL Server, MS Access, Excel, Flat Files and XML using Talend.
- Experienced in working with data analytics, web Scraping and Extraction of data in Python
- Designed & Implemented database Cloning using Python and Built backend support for Applications using Shell scripts
- Worked on various compression techniques like GZIP and LZO.
- Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive.
- Implemented HBase on top of HDFS to perform real time analytics.
- Handled Avro Data files using Avro Tools and Map Reduce.
- Developed Data pipelines by using Chained Mappers.
- Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
- Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
- Active involvement in Scrum meetings and Followed Agile Methodology for implementation.
Environment: - HDFS, Map Reduce, Hive, Flume, Pig, Sqoop, Oozie, HBase, RDBMS/DB, Flat files, MySQL, CSV, Avro data files.
Confidential
Java Developer
Responsibilities:- Actively involved from fresh start of the project, requirement gathering to quality assurance testing.
- Coded and Developed Multi-tier architecture in Java, J2EE, Servlets.
- Conducted analysis, requirements study and design according to various design patterns and developed rendering to the use cases, taking ownership of the features.
- Used various design patterns such as Command, Abstract Factory, Factory, and Singleton to improve the system performance. Analyzing the critical coding defects and developing solutions.
- Developed configurable front end using Struts technology. Also involved in component based development of certain features which were reusable across modules.
- Designed, developed and maintained the data layer using the ORM framework called Hibernate.
- Used Hibernate framework for Persistence layer, involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
- Developed batch jobs which will run on specified time to implement certain logic in java platform.
- Developing & deploying Archive files (EAR, WAR, JAR) using ANT build tool.
- Used Software development best practices for Object Oriented Design and methodologies throughout Object oriented development cycle
- Responsible for developing SQL Queries required for the JDBC.
- Designed the database and worked on DB2 and executed DDLS and DMLS.
- Active participation in architecture framework design and coding and test plan development.
- Strictly followed Water Fall development methodologies for implementing projects.
- Thoroughly documented the detailed process flow with UML diagrams and flow charts for distribution across various teams.
- Involved in developing training presentations for developers (off shore support), QA, Production support.
- Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.
Environment: -Java JDK (1.5), Java J2EE, Informatica, Oracle 11g (TOAD and SQL developer) Servlets, Jboss application Server,Water Fall, JSPs, EJBs, DB2, RAD, XML, Web Server, JUNIT, Hibernate, MS ACCESS, Microsoft Excel.