Big Data Developer Resume
Washington, DC
SUMMARY
- Around 8 years of IT experience in analysis, design and development using Big Data Ecosystem (Pig, Hive, Impala and Spark, Scala), Java and J2EE in all phases of the iterative Software Development Life Cycle (SDLC).
- Experience in working in environments using Agile (SCRUM), RUP and Test - Driven development methodologies.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing
- Good understanding of NoSQL Database and hands on work experience in writing applications on No SQL database which is Mongo DB.
- Professional experience as an ETL developer in an Enterprise Data Warehousing environment. hands on experience with Informatica and Teradata
- Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat in using the MapReduce programming model for analyzing the data stored in Hadoop
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems on spark components like Spark SQL, MLlib, Spark Streaming and GraphX.
- Experience in analyzing data using HiveQL, PIG Latin and custom Map Reduce programs in JAVA on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Experience in working with Version Control Tools like Rational Team Concert, Harvest, Clear Case, SVN, and Git-hub.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Experience to develop enterprise applications with MVC architecture with application servers and Web in analyzing large amounts of data sets writing PySpark scripts and Hive queries. .
- Involve in moving all log files generated from various sources to HDFS and Spark for further processing in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology
- Experience on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS.
- Experience in developing web services with XML based protocols such as SOAP, Axis, UDDI and WSDL deployment and Integration using SQL, Big data Ecosystems, data management and visualization tools.
- Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Sqoop, HBase, Spark, Spark SQL, Oozie, Zookeeper, Hue.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing
- Extensive experience in installing, configuring and using Big Data ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Hive, Impala, Spark and Zookeeper.
- Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Good knowledge on various API Managers like Azure, Apigee in writing and deploying Oozie Workflows and Coordinators.
- Highly skill in integrating Amazon Kinesis streams with Spark streaming applications to build long running real-time applications.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse.
TECHNICAL SKILLS
Big Data Tools & Technologies: Horton works HDP, Hive, Apache Spark, SQL, MapReduce, Pig, HBase, Cassandra, NoSQL, Sqoop, Oozie, YARN, Tableau
Hadoop ECO Systems: Spark-core, Kafka, Spark- SQL, HDFS, YARN, Sqoop, PIG, Hive, Oozie, Flume, Map Reduce, Storm
Development And Building Tools: Eclipse, Net Beans, IntelliJ, ANT, Maven, IVY, TOAD, SQL Developer
Data Bases: HBase, Cassandra, Microsoft SQL Server, MySQL, MongoDB, Oracle 9i/10g/11g, SQL Server 2008 R2/2012, My SQL,ODI, SQL/PL-SQL, MS-SQL Server 2005
Operating Systems: LINUX, Ubuntu, Windows
Programming /Scripting languages: Java, Python, Scala, JavaScript, SQL, Shell Scripting
Security Management: Horton works Ambari, Cloudera Manager, SSL/TLS, Kerberos
Data Modeling Tools: Erwin 7.3/7.1/4.1/4.0
IDEs: Eclipse, IntelliJ, Spark Eclipse
Project Management Tools & DevOps: Rally, JIRA, Jenkins, Bitbucket (GIT)
Frameworks: JUnit and Jest, Spring, Hibernate, Kafka, Flask, Django, Android, Zeplin, Akka, ActiveMq, WSO2 ESB, WSO2 CEP, ORC
PROFESSIONAL EXPERIENCE
Confidential - Washington, DC
Big Data Developer
Responsibilities:
- Developed ETL data pipelines using Sqoop, Spark, Spark SQL, Scala, and Oozie used Spark for interactive queries, processing of streaming data and integrated with popular NoSQL databases
- Experience with AWS Cloud IAM, Data pipeline, EMR, S3, EC2 developed the batch scripts to fetch the data from AWS S3 storage and do required transformations
- Developed Spark code using Scala and Spark-SQL for faster processing of data created Oozie workflow engine to run multiple Spark jobs.
- Develop scripts to automate the execution of ETL using shell scripts under UNIX environment with Terraform scripts which automates the step execution in EMR to load the data to Scylla DB.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark-SQL, Data Frame, pair RDD, Spark YARN.
- De-normalizing the data as part of transformation which is coming from Netezza and loading it to No SQL Databases and MySQL.
- Developed Kafka consumer API in Scala for consuming data from Kafka topics implemented data quality checks using Spark Streaming and arranged bad and passable flags on the data.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system using Scala programming.
- Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming using Scala Programming language.
- Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive loaded the data into Spark RDD and did in memory data Computation to generate the output response.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data models which get the data from Kafka in near real time and persist it to Cassandra.
- Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala developed Spark scripts using Scala Shell commands as per the requirements.
- Extracting real time data using Kafka and Spark streaming by Creating D streams and converting them into RDD, processing it and stored it into Cassandra.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data used Scala to develop Scala coded spark projects and executed using spark-submit.
Environment: Big Data3.0, SDLC, Azure, HDFS, Scala, SQL, Hive2.3, spark, Kafka1.1, Hadoop3.0, Apache Nifi, ETL, Sqoop1.4, Flume1.8, PySpark, elastic search, Oozie4.3, Jenkins, XML, MYSQL, GitHub, Hortonworks, Cloudera, MongoDB.
Confidential - Germantown, MD
Big Data Developer
Responsibilities:
- Worked on Hadoop eco-systems including Hive, Mongo DB, Zookeeper, Spark Streaming with MapR distribution developed Big Data solutions focused on pattern matching and predictive modeling.
- Worked on analyzing Hadoop clusters and different big data analytic tools including Pig, HBase database and Sqoop.
- Participated in all aspects of Software Development Life Cycle (SDLC) and Production troubleshooting, Software testing using Standard Test Tool.
- Involved in Agile methodologies, daily scrum meetings, spring planning involved in identifying job dependencies to design workflow for Oozie and YARN resource management.
- Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Extensively used ETL to transfer and extract data from source files (Flat files and DB2) and load the data into the target database.
- Loaded the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
- Designed solutions for various system components using Microsoft Azure performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into the target database.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Worked on Apache Nifi as ETL tool for batch processing and real time processing moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Involved in identifying job dependencies to design workflow for Oozie & YARN resource management used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
Environment: Horton works Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Ambari, SAS, PySpark, SPSS, Unix Shell Scripts, Zoo Keeper, SQL, Map Reduce, Pig.
Confidential - Frisco, TX
Hadoop Developer
Responsibilities:
- Developed Spark scripts by using Scala shell commands as per the requirement worked with technology and business groups for Hadoop migration strategy.
- Implemented the project by using Agile Methodology and Attended Scrum Meetings daily on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive loaded the data into Spark RDD and does in memory data computation to generate the output response.
- Developed Pig scripts to help perform analytics on JSON and XML data handled importing of data from machine logs using Flume
- Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
- Implemented map-reduce counters to gather metrics of good records and bad records used Cloudera distribution for Data transformation and Data preparation.
- Involved in configured the Storm in loading the data from MYSQL to HBASE using Jms, Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Extracted the data from Teradata into HDFS/Databases/Dashboards using Spark Streaming installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
- Collected the log data from web servers and integrated them into HDFS using Flume exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
- Import the data from different sources like HDFS/HBase into Spark RDD experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD, Spark YARN.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into Mongo DB.
- Extracted files from Couch DB through Sqoop and placed in HDFS and processed implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
Environment: Spark, Hadoop3.0, Agile, AWS, Oozie4.3, Cassandra, MapReduce, Apache Pig0.17, Scala, Hive2.3, HDFS, Apache Flume1.8, HBase1.2, Apache, MongoDB, Sqoop1.4, Zookeeper.
Confidential
Software Developer
Responsibilities:
- Extensively used for system analysis, design and development using J2EE architecture actively participated in requirements gathering, analysis, and design and testing phases.
- Developed the application using Spring Framework that leverages classical Model View Controller (MVC) architecture.
- Involved in Software Development Life cycle starting from requirements gathering and performing OOA and OOD.
- Designed and created components for the company's object framework using best practices and design Patterns such as Model-View-Controller (MVC).
- Developed user interfaces using HTML, XML, CSS, JSP, JavaScript and Struts Tag Libraries and defined common page layouts using custom tags
- Debugged the application using Firebug to traverse the documents used DOM and DOM Functions using Firefox and IE Developer Toolbar for IE.
- Implemented Struts MVC Paradigm components such as Action Mapping, Action class, Action Form, Validation Framework, Struts Tiles and Struts Tag Libraries
- Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
- Developed the presentation layer using CSS and HTML taken from Bootstrap to develop for browsers used Spring Core and Spring-web framework created a lot of classes for the backend.
- Involved in developing web pages using HTML and Bootstrap used PL/SQL for queries and stored procedures in SQL as the backend RDBMS.
- Established continuous integration with JIRA, Jenkins used Hibernate to manage Transactions (update, delete) along with writing complex SQL and HQL queries.
- Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
- Used Microsoft VISIO for developing use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase. .
- Provided production support which includes handling tickets & providing resolution create database objects like tables, sequences, views, triggers, stored procedures, functions packages.
Environment: Java, Struts, Spring, Hibernate 3.0, JSP, JavaScript, HTML, XML, jQuery, Oracle, Eclipse, JBoss Application Server, ANT, CVS, and SQL.