Sr. Spark/hadoop/trifacta Developer Resume
Tx
SUMMARY:
- Having 8+ years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
- Having 4+years of work experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, ZooKeeper, Sqoop, Flume, Oozie and AWS.
- Extensive experience working in Teradata, Oracle, Netezza, SQLServer and MySQL database.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
- Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapR and Apache distributions.
- Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
- Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, Redshift which provides fast and efficient processing of Big Data.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
- Good understanding of python Programming, Data Mining and Machine Learning techniques.
- Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
- Experience in troubleshooting errors in Pig, Hive and MapReduce.
- Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
- Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity.
- Experience in extending HIVE and PIG core functionality by using custom UDF’s.
- Debugging MapReduce jobs using Counters and MRUNIT testing.
- Good understanding of Spark Algorithms such as Classification, Clustering, and Regression.
- Good understanding on Spark Streaming with Kafka for real - time processing.
- Extensive experience working with Spark tools like RDD transformations, spark MLlib and spark QL.
- Experienced in moving data from different sources using Kafka producers, consumers and pre-process data using Storm topologies.
- Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
- Good knowledge on streaming data from different data sources like Log files, JMS, applications sources into HDFS using Flume sources.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Worked on Docker based containerized applications.
- Knowledge of data warehousing and ETL tools like Talend, Power BI andTrifacta.
- Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari.
- Experience with Testing MapReduce programs using MR Unit, Junit.
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
- Expertise in developing responsive Front End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, JQuery and AngularJS.
- Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus.
- Experience in different application servers like JBoss/Tomcat, Web Logic, IBM WebSphere.
- Experience in working with Onsite-Offshore model.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, Storm, Drill, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.
Hadoop Distributions: Cloudera, MapR, Hortonworks
Languages: Java, Scala, Python, SQL, HTML, JavaScript and C/C++
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
Web Design Tools: HTML, JavaScript, JQuery and CSS and AngularJs
Development/Build Tools: Eclipse, Ant, Maven, Gradle, IntelliJ, JUNIT and log4J.
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
ETL Tools: Power BI, Trifecta(Highest level Certified)&Tableau
PROFESSIONAL EXPERIENCE:
Confidential, TX.
Sr. Spark/Hadoop/Trifacta Developer
Responsibilities:
- Highest level (wrangler & knight) certified in Trifacta.
- Develop MapReduce jobs in spark for log analysis, analytics, and data cleaning.
- Perform big data processing using Hadoop, MapReduce, Sqoop, Oozie, and Impala.
- Extensive experience in data analytics and data wrangling using ERP tools such as Ttrifacta.
- Working knowledge in Trifacta API access to push wrangled data to analytical application such as power BI.
- Experience in HDFS and Hadoop infrastructure.
- Extensive experience in Sql and also use of T-SQL Server.
- Worked on cloud deployment of application and infrastructure configuration.
- Data analytics supports decisions for high-priority, enterprise initiatives involving IT/product development.
- Provided technical expertise and guidance for data management, quality and reporting functions
- Import data from SQL to HDFS, using Sqoop to load data.
- Developed and designed a 5-node Hadoop cluster for sample data analysis.
- Regularly tune performance of Hive and Pig queries to improve data processing and retrieving.
- Create visualizations and reports for the business intelligence team, using Trifacta and power BI.
- Experience in Cluster maintenance as well as creation and removal of nodes using tools like Cloudera Manager Enterprise.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Screen Hadoop cluster job performances and capacity planning.
- Help optimize and integrate new infrastructure via continuous integration methodologies.
- Report defects to the development team or manager and driving them to closure.
- Have an appetite to learn and implement new technologies and solutions.
- Experience in configuring and deploying Virtual machines in cloud.
- Involved in file movement between HDFS and AWS S3 and worked with S3 buckets in AWS .
- Involved in integrate the data in MySQL from extract provided by client.
- Involved in Import files from various RDBMS source mainly from My SQL to Trifacta .
- Involved in analyze the data across different databases with different standards and schema and provide the best solution to client.
- Involved in Import, deploy and exporting/load final output file in HDFS or .csv or .xls format using Trifacta .
- Involved in Ftp’ing data to client server using WinSCP.
- Working experience in Agile methodology
Environment: AWS initially, Azure, Cloudera, Hadoop (HDFS, YARN, Spark), Linux, Trifacta,Python, Azure DevOps, MySQL, T-SQL, Power BI.
Confidential, WI.
Hadoop Developer
Responsibilities:
- Involved in importing and exporting data between Hadoop Data Lake and Relational Systems like Oracle , MySQL, using Sqoop .
- Worked on Creating Kafka topics , partitions, writing custom partitioners classes.
- Experienced in writing Spark Applications in Scala and Python (Pyspark).
- Extracting real time data using Kafka and Sparkstreaming by Creating D streams and converting them into RDD , processing it and stored it into Cassandra .
- Experience in building Real-time DataPipelines with Kafka Connect and Spark Streaming.
- Configured, deployed and maintained multi-node Dev and Test KafkaClusters.
- Processed and transferred the data from Kafka into HDFS through SparkStreamingAPIs.
- Building the Cassandra nodes using AWS& setting up the Cassandra cluster using Ansible automation tools.
- Worked and learned a great deal from Amazon Web Services (AWS) cloud services like EC2, S3, EMR, EBS, RDS and VPC.
- Used highly available AWS Environment to launch the applications in different regions and implemented Cloud Front with AWS Lambda to reduce latency.
- Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines.
- Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
- Written extensive Hive queries to do transformations on the data to be used by downstream models.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers/sensors.
- Experience in writing and tuning extensive Impala queries and creating views for Adhoc and business processing.
- Design solution for various system components using Microsoft Azure.
- Written generic extensive data quality check framework to be used by the application using impala.
- Generated various marketing reports using Tableau with Hadoop as a source for data.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (Pyspark).
- Involved in the process of Cassandra data modelling and building efficient data structures.
- Written storm topology to emit data into Cassandra DB.
- Understanding of Kerberos authentication in Oozie workflow for Hive and Cassandra.
Environment: Hadoop, Hive, Impala, Oracle, Spark, Python, Pig, Sqoop, Oozie, Map Reduce, GIT, HDFS, Cassandra, Apache Kafka, Storm, Linux, Tableau, Solr, Confluence, Jenkins, Jira
Confidential, Stamford, CT.
Hadoop Developer
Responsibilities:
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS .
- Experience in creating batch and real-time pipelines using Spark as the main processing framework.
- Worked on the large-scale Hadoop Yarn cluster for distributed data processing and analysis using Spark , Hive , and HBase .
- Collected JSON data from HTTP source and developed Spark API’s that helps to do inserts and updates in Hive tables.
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Developed optimal strategies for distributing the weblogdata over the cluster importing and exporting the stored web logdata into HDFS and Hive using Sqoop .
- Used Amazon cloud-watch to monitor and track resources on AWS.
- Worked on migrating MapReduce programs into Spark transformations using Spark with Scala.
- Implemented spark sample programs in python using Pyspark.
- Experience in designing the reporting application that uses the SparkSQL to fetch and generate reports on HBase.
- Extensively used Spark SQL , PysparkAPI's for querying and transformation of data residing in Hive .
- Responsible for developing the data pipeline using Sqoop, Flume and Pig to extract data from weblogs and store in HDFS.
- Experience in loading D-Stream data into Spark RDD and did in-memory data computation to generate output response.
- Experience in handling continuous streaming data which comes from different sources using Flume and set the destination as HDFS.
- Experience in loading Data into HBase using Bulk Load and Non-bulk load.
- Experience in working on designing and developing ETL workflows using Java for processing data in HDFS/HBase using Oozie.
- Hands on experience on loading the Created HFiles into HBase for faster access of large customer base without taking performance hit.
- Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Involve in using OOZIE operational services for batch processing and scheduling workflows dynamically.
Environment: AWS (EMR, EC2, S3), Cloudera, MapReduce, Pig, Hive, Sqoop, Flume, Pyspark, Spark, Scala, Java, HBase, Apache Avro, Oozie, Zookeeper, Elastic Search, Kafka, Python, JIRA, CVS and Eclipse.
Confidential
Java Developer
Responsibilities:
- Coded front-end components using HTML, JavaScript and jQuery, Back End components using Java, spring, Hibernate, Services Oriented components using Restful and SOAP based web services, and Rules based components using JBoss Drools.
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
- Integrated Spring Dependency Injection among different layers of an application with spring and O/R mapping tool of Hibernate for rapid development and ease of maintenance.
- Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
- Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
- Implemented the Connectivity to the Database Server Using JDBC.
- Developed the RESTful web services using Spring IOC to provide user a way to run the job and generate daily status report.
- Developed and exposed the SOAP web services by using JAX-WS, WSDL, AXIS, JAXP and JAXB.
- Configured domains in production, development and testing environments using configuration wizard.
- Created SOAP Handler to enable authentication and audit logging during Web Service calls.
- Created Service Layer API's and Domain objects using Struts.
- Used AJAX and JavaScript for validations and integrating business server-side components on the client side within the browser.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
- Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Used XSLT to transform my XML data structure into HTML pages.
- Deployed EJB Components on Tomcat. Used JDBC API for interaction with Oracle DB.
- Developed the UI panels using JSF, XHTML, CSS, DOJO and jQuery.
Environment: Java 6 - JDK 1.6, JEE, Spring 3.1 framework, Spring Model View Controller (MVC), Java Server Pages (JSP) 2.0, Servlets 3.0, JDBC4.0, AJAX, Web services, Rest API, JSON, Java Beans, jQuery, JavaScript, Oracle 10g, JUnit, HTML Unit, XSLT, HTML/DHTML.
Confidential
Java Developer
Responsibilities:
- Created the Database, User, Environment, Activity, and Class diagram for the project (UML).
- Implement the Database using Oracle database engine.
- Designed and developed a fully functional generic n-tiered J2EE application platformthe environment was Oracle technology driven. The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- Rich Faces.
- Created an entity object (business rules and policy, validation logic, default value logic, security).
- Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
- Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ).
- Designing GUI prototype using ADF 11G GUI component before finalizing it for development.
- Used Cascading Style Sheet ( CSS ) to attain uniformity through all the pages.
- Create Reusable Component (ADF Library and ADF Task Flow).
- Experience using Version controls such as CVS, PVCS, and Rational Clear Case.
- Creating Modules Using Task Flow with Bounded and Unbounded.
- Generating WSDL (Web Services) And Create Work Flow Using BPEL.
- Handel the AJAX functions (partial trigger, partial Submit,auto Submit)
- Created the Skin for the layout.
Environment: Java core, Servlet, JSF, ADF Rich client UI Framework ADF-BC (BC4J) 11g, web services Using Oracle SOA (BPEl), Oracle Web Logic.