Big Data Engineer Resume
TX
PROFESSIONAL SUMMARY:
- Around 7 years of IT experience as a Developer, Designer & quality reviewer with cross platform integration experience using Hadoop, Java, J2EE and SOA.
- Skilled experience in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Sqoop, Flume, Yarn, Spark, Kafka and Oozie.
- Strong understanding of Hadoop daemons and Map - Reduce concepts.
- Strong experience in importing-exporting data into HDFS format.
- Experienced in developing UDFs for Hive using Java.
- Worked with Apache Falcon which is a data governance engine that defines, schedules, and monitors data management policies.
- Hands on experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Oozie, Flume and HBase).
- Strong understanding and strong knowledge in NoSQL databases like HBase, MongoDB & Cassandra.
- Experience in working with Anguar 4, Nodejs, Bookshelf, Knex, MariaDB.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
- Good skills in developing reusable solution to maintain proper coding standard across different java project.
- Good exposure to Python programming.
- Expertise in debugging and optimizing Oracle and java performance tuning with strong knowledge in Oracle 11g and SQL
- Ability to work effectively in cross-functional team environments and experience of providing training to business users.
- Good experience in using Sqoop for traditional RDBMS data pull.
- Good working knowledge of Flume.
- Worked with Apache Ranger console to create and manage policies for access to files , folders , databases , tables , or columns .
- Worked with Yarn Queue Manager to allocate queue capacities for different service accounts .
- Hands on experience on Hortonworks and Cloudera Hadoop environments.
- Familiar with handling complex data processing jobs using Cascading.
- Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
- Extensive experience in Shell scripting.
- Experience in component design using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Excellent analytical and programming abilities in using technology to create flexible and maintainable solutions for complex development problems.
- Good communication and presentation skills, willing to learn, adapt to new technologies and third party products.
TECHINCAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Storm and ZooKeeper.
No SQL Databases: HBase, Cassandra, MongoDB
Languages: C, C++, Java, Python, Scala, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
Version control: SVN, CVS, GIT
PROFESSIONAL EXPERIENCE:
Confidential, TX
Big Data Engineer
Responsibilities:
- Part of planning/migration team for Application Migration from MapR distribution to HDP environment.
- Reviewing application architectures for better understanding of the dependencies , file formats , types of data , tools , service-accounts etc.., i.e. important factors in order to migrate the apps to HDP platform.
- Coordinating with teams for issue resolving regarding workflows , schemas , scripts and kerberized environment.
- Used Apache Falcon for mirroring of HDFS and HIVE data.
- Used Apache Falcon to design data pipelines and trace them for dependencies, tagging, audits and lineage.
- Worked with Apache Ranger console to manage policies for access to files , folders , databases , tables , or columns .
- Used HBASE snapshotting to migrate HBASE tables.
- Worked in Kerberos environment .
- Worked with Oozie to design workflows and scheduled with Falcon .
- Ingested various types of data into Hive using ELake Ingestion Framework which internally uses Pig, Hive and Spark for data processing .
- Worked with Hortonworks for issue resolving regarding various tools like Hive , HBase and Falcon etc.
- Worked with Avro schemas for Hive .
- Created Hive tables on top of HBase using Storage Handler for effective OLAP analysis.
- Worked with Flume to ingest data from MySql to HDFS .
- Working with Nodejs to extract the Apache Ranger policies from several REST endpoints from different clusters and store it in MariaDB .
- Used Knex as Query builder and Bookshelf for ORM .
Environment: Hadoop, Hortonworks, MapReduce, HDFS, Hive, Pig, Sqoop, Oozie, Falcon, Linux, XML, MySQL, HBase.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Prepared an ETL framework with the help of sqoop , pig and hive to be able to frequently bring in data from the source and make it available for consumption .
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Developed analytical components using Scala, Spark and Spark Stream.
- Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
- Involved in Cassandra Data modelling to create Keyspaces and Tables in Amazon Cloud environment .
- Developed ETL jobs using Spark -Scala to migrate data from Oracle to new Cassandra tables.
- Rigorously used Spark -Scala ( RRD's , Dataframes , Spark Sql ) and Spark - Cassandra -Connector API's for various tasks ( Data migration , Business report generation etc.)
- Developed Spark Streaming application for real time sales analytics .
- Built real time pipeline for streaming data using Kafka and Spark Streaming.
- Experience in migration of data across cloud environment to Amazon EC2 clusters .
- EC2 -to- S3 data synch
- Analyzed the SQL scripts and designed the solution to implement using PySpark
- Extracted the data from other data sources into HDFS using Sqoop
- Handled importing of data from various data sources, performed transformations using Hive , MapReduce , loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Expert in importing and exporting data into HDFS using Sqoop and Flume .
Environment : CDH5, Spark, Cassandra, Kafka, Scala, Hive, SQOOP, Pig, Linux, XML, MySQL, PL/SQL, SQL connector
Confidential, Wayne, PA
Hadoop/ Spark Developer
Responsibilities:
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Experienced with batch processing of data sources using Apache Spark, Elastic search.
- Developed code base to stream data from sample data files > Kafka > Kafka Spout >Storm Bolt > HDFS Bolt.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs , Python and Scala .
- Developed multiple POCs using PySpark and deployed on the Yarn cluster , compared the performance of Spark , with Hive and SQL/Teradata .
- Analyzed the SQL scripts and designed the solution to implement using PySpark
- Uploaded data to Hadoop Hive and combined new tables with existing databases.
- Deployed the Cassandra cluster in cloud (Amazon AWS) environment with scalable nodes as per the business requirement.
- Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
- Implemented the ETL design to dump the Map-Reduce data cubes to Cassandra cluster.
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
Environment: Hadoop, MapReduce, HDFS, Hive, Apache Spark, Apache Kafka, Apache Cassandra, Apache Storm, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse.
Confidential, Madison, WI
Hadoop Developer
Responsibilities:
- Responsible for understanding the scope of the project and requirement gathering.
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
- Created & documented Test Strategy, scenarios and procedures.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Conducted POC’ s for ingesting data using Flume.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Creating views for restricting data access by business area
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Perform structured application code reviews and walkthroughs.
- Conduct/Participate in project team meetings to gather status, discuss issues & action items
- Provide support for research and resolution of testing issues.
- Coordinating with Business for UAT sign off
- Create implementation plan and Detailed Task Schedules
Environment - Hadoop, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Autosys, IBM Data studio, WinSCP, UltraEdit, NDM, Quality Center 9.2, Windows & Microsoft Office.
Confidential
Java Developer
Responsibilities:
- Implemented J2EE standards, MVC2 architecture using Struts Framework
- Implementing Servlets, JSP and Ajax to design the user interface
- Used JSP, JavaScript, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface
- Used JBoss for EJB and JTA, for caching and clustering purpose
- Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests
- All the Business logic in all the modules is written in core Java
- Wrote Web Services using SOAP for sending and getting data from the external interface
- Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework
- Used Design patterns such as Business delegate, Service locator, Model View Controller, Session, DAO
- Implemented the presentation layer with HTML, XHTML, JavaScript, and CSS
- Developed web components using JSP, Servlets and JDBC
- Involved in fixing defects and unit testing with test cases using JUnit
- Developed user and technical documentation
- Made extensive use of Java Naming and Directory interface (JNDI) for looking up enterprise beans
- Developed presentation layer using HTML, CSS and JavaScript
- Developed stored procedures and triggers in PL/SQL
Environment: JAVA multithreading, collections, J2EE, EJB, UML, SQL, PHP, Sybase, Eclipse, JavaScript, WebSphere, JBOSS, HTML5, DHTML, CSS, XML, ANT, STRUTS 1.3.8, JUNIT, JSP, Servlets, Rational Rose, Hibernate, JSP, Servlets, JDBC, CSS, MySQL, JUnit, Apache Tomcat.