We provide IT Staff Augmentation Services!

Data Engineer/developer. Resume

4.00/5 (Submit Your Rating)

Boston, MA

PROFESSIONAL SUMMARY:

  • Over 7+ years of experience in IT industry which includes 3+ years of experience in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
  • 3 years Extensive Experience in design and development of multi - tier applications using Jdk 1.8, J2EE, spring frameworks.
  • Expertise in developing spark streaming applications in Python and Scala using spark RDD's, Data frames, Spark SQL.
  • Expertise in creating Kafka topics, MapR Streams and MapR Topics.
  • Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
  • Worked with NoSQL databases like HBase and MapR-DB for information extraction and place huge amount of data.
  • Extensive experience in Shell scripting.
  • Experience with Oozie workflow engine in running jobs with actions that run Sqoop, Pig and Hive jobs.
  • Importing and exporting data from different Relational databases like MySQL, Oracle into HDFS and Hive using Sqoop.
  • Extensive experience working in Oracle, DB2, SQL Server and MySQL database.
  • Experience in cluster monitoring tools like MapR Control System (MCS), Cloudera Manager.
  • Experience in developing real time streaming pipelines using Apache Kafka, MapR Streams and Spark Streaming.
  • Knowledge on different distributions of Hadoop like Cloudera and MapR.
  • Worked on data streaming tools and ETL tools like Streamsets Data Collector, Attunity Replicate.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Worked on HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
  • Work closely with business partners to understand business requirements and design solutions based on those requirements.
  • Extensively involved in writing PL/SQL, stored procedures, functions and packages.
  • Experience in writing UDFs and configuring CRON Jobs.
  • Good Knowledge on reporting and data visualization tools like Oracle Data Visualization Desktop, Tableau and Grafana.
  • Outstanding communication and presentation skills, willing to learn, adapt to new technologies and third-party products.

TECHNICAL SKILLS:

Hadoop/Big data: HDFS, Hive, Map Reduce, Spark, Sqoop, HBase, Kafka, Oozie, Zookeeper

No SQL Databases: HBase, MapR-DB

Languages: Python, Scala, Core Java, Unix Shell scripts, SQL

Web/Application Server: Apache Tomcat

Java Technologies: J2EE, Servlets, JSP, JDBC, Java Beans, Java Script, JMS

Databases: Oracle, DB2, SQL Server, MySQL

IDEs: Eclipse, Intellij, DB Visualizer

Other Tools & packages: CVS, SVN, JUnit, Maven, ANT, GitHub, Streamsets Data Collector, Oracle DVD, Grafana, Tableau.

SDLC Methodology: Agile, Waterfall model

Operating Systems: Linux, UNIX, Windows

Office Tools: MS Office, Word, Power Point

PROFESSIONAL EXPERIENCE:

Confidential - Boston, MA

Data Engineer/Developer.

Responsibilities:

  • Involved in Requirement analysis, Design, development and testing of the application.
  • Configured Kafka Connect JDBC with SAP HANA and MapR Streams for both real time streaming and batch process.
  • Created MapR-Event Streams and Kafka topics.
  • Worked on Attunity Replicate to load data from SAP ECC to Apache Kafka topics.
  • Developed Spark Streaming application using Python to stream data from MapR Event Streams and Apache Kafka topics to Hive and MapR-DB and also to stream data from one topic to the other topic with in the MapR Event Streams.
  • Worked on DStreams (Discretized Stream), RDD’s (Resilient Distributed Dataset), Dataframes, Spark SQL to build the spark streaming application.
  • Involved in creating SQL queries to extract data, to perform joins on the tables in SAP HANA and MySQL.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Implemented Partitioning, Dynamic Partition, and Bucketing in Hive for efficient data access.
  • Used Hue and MapR Control System (MCS) to monitor and troubleshoot Spark jobs.
  • Developed SQOOP scripts to move data from MapR-FS to SAP HANA.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Created stored procedures in MySQL to improve data handling and ETL Transactions.
  • Worked on data validation using HIVE and also written Hive UDFs.
  • Configured SAP HANA source connector with SAP HANA as source and Apache Kafka topic as target for real time streaming and batch processing.
  • Developed Streaming application to stream data from MapR ES to HBase.
  • Streamed data from Apache Kafka topics to time series database OPEN TSDB.
  • Built dashboards and visualizations on top of MapR-DB and Hive using Oracle data visualizer desktop. Built real-time visualizations on top of Open TSDB using Grafana.
  • Worked on UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.

Environment: MapR 6.0, Apache Kafka 1.0.0, Hive 2.1, HBase 1.1.8, Hue, MapR-DB, MapR-FS, Attunity Replicate, Spark 2.1.0, Spark SQL, Python, SAP HANA, SQOOP, Oozie, Pig, SAP ECC, IntelliJ, Kafka Connect Framework, DB Visualizer, Oracle Data Visualizer Desktop, Stream-sets Data collector, MapR-ES, MySQL, GIT.

Confidential - Minneapolis, MN

Data Engineer/Developer

Responsibilities:

  • Worked on enhancing the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Spark RDD's.
  • Worked on MySQL for identifying required tables and views to export into HDFS.
  • Loaded data from MySQL to HDFS to development cluster for validation and cleansing.
  • Created Apache Kafka topics.
  • Configured Streamsets data collector with Apache Kafka to stream real time data from different sources (database & files) into Kafka topics.
  • Developed streaming application to stream data from Kafka topics to Hive using Spark, Python.
  • Worked on real time processing and batch processing of data sources using Apache Spark, Elastic search, Spark Streaming, Apache Kafka.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, python, Spark SQL.
  • Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
  • Conducted POC's for real time streaming of data from MySQL to Hive and HBase.
  • Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
  • Handled importing & exporting of large data sets from various data sources into HDFS and vice-versa using Sqoop, performed transformations using Hive and loaded data into HDFS.
  • Built dashboards and visualizations on top of Hive using Tableau and published those reports on tableau online accounts and on the browser using iframe.
  • Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages.
  • Worked on Java Mail API to send text and email notifications to Customers.
  • Loaded data from UNIX file system to HDFS.

Environment: Cloudera, Apache Kafka, HDFS, Python, Hive, Spark, Spark SQL, PIG, Map Reduce, SQOOP, IntelliJ, Tableau, Stream-sets Data collector, UNIX, MySQL, GIT.

Confidential - San Diego, CA

Data Engineer

Responsibilities:

  • Involved in testing SQL Scripts for report development and handled the performance issues effectively.
  • Performed data cleaning depending on the requirement using SQL to ensure data quality and completeness to ensure proper accuracy of data.
  • Worked extensively with Flume for importing data from various webservers to HDFS.
  • Developed end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Monitoring systems and services through Cloudera Manager to make the clusters available for the business.
  • Worked on Flume to load the log data from multiple sources directly into HDFS.
  • Developed Sqoop scripts for importing and exporting data into HDFS and Hive.
  • Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
  • Worked with HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Scheduled the workflows using Oozie workflow scheduler.
  • Worked in Agile and used JIRA for maintain the stories about project.
  • Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
  • Created hive queries for extracting data and sending them to clients.

Environment: Apache Kafka, HDFS, Hive, Spark, GIT, MySQL, PIG, Python, DB2, Cron Jobs, UNIX, Spark SQL, SQOOP, OOZIE.

Confidential

Java Developer

Responsibilities:

  • Involved in the implementation of design using vital phases of the Software development life cycle.
  • Involved in design, development and testing of the application.
  • Implemented the object-oriented programming concepts for validating the columns of the import file.
  • Responsible for creating RESTful Web services using JAX-RS.
  • Used DOM Parser to parse the xml files.
  • Implemented complex back-end component to get the count in no time against large size MySQL database (about 4 crore rows) using Java multi-threading.
  • Experience working in agile development following SCRUM process, Sprint and daily stand-up meetings.
  • Developed front-end screens using JSP, HTML, JQuery, JavaScript and CSS.
  • Participate in OOAD, domain modelling, and system architecture.
  • Used WinSCP to transfer file from local system to other system.
  • Coming up with the test cases for unit testing before the QA release.
  • Working closely with QA team and coordinating on fixes.

Environment: Java, Core Java, Apache Tomcat, Maven, JavaScript, RESTful Web Services, Web logic, JBoss, Eclipse IDE, Apache CXF, FTP, HTML, CSS.

Confidential

Java/J2EE Developer

Responsibilities:

  • Analyzed and modified existing code wherever required and participated in developing the designs document.
  • Uses Rational Rose for model driven development and UML modelling.
  • Responsible and active in the analysis, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
  • Developed the presentation layer using JSP, HTML, and CSS.
  • Performed form validations using JavaScript and Struts validators.
  • Used JDBC technology to establish connection with Oracle database Communicated with the database using PL/SQL.
  • Participated in understanding of business requirements, design and development of the project.
  • Migrated to a Struts based system from the existing JSP/ Servlets/ Beans based application.
  • Developed JSP pages and client-side validation by java script tags.
  • Developed Web services for sending and getting data from different applications using.
  • Implement Front controller design pattern.
  • Resolved critical bugs.

Environment: Java, J2EE Servlet, JSF 2, XML, JSON, HTML, CSS, JQuery, Spring 3.0, Log4j, Git, Maven, Eclipse, Apache Tomcat 6, and Oracle 11g.

We'd love your feedback!