Data Engineer/Developer. Resume Boston, MA - Hire IT People

PROFESSIONAL SUMMARY:

Over 7+ years of experience in IT industry which includes 3+ years of experience in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
3 years Extensive Experience in design and development of multi - tier applications using Jdk 1.8, J2EE, spring frameworks.
Expertise in developing spark streaming applications in Python and Scala using spark RDD's, Data frames, Spark SQL.
Expertise in creating Kafka topics, MapR Streams and MapR Topics.
Highly skilled in integrating Kafka with Spark streaming for high speed data processing.
Worked with NoSQL databases like HBase and MapR-DB for information extraction and place huge amount of data.
Extensive experience in Shell scripting.
Experience with Oozie workflow engine in running jobs with actions that run Sqoop, Pig and Hive jobs.
Importing and exporting data from different Relational databases like MySQL, Oracle into HDFS and Hive using Sqoop.
Extensive experience working in Oracle, DB2, SQL Server and MySQL database.
Experience in cluster monitoring tools like MapR Control System (MCS), Cloudera Manager.
Experience in developing real time streaming pipelines using Apache Kafka, MapR Streams and Spark Streaming.
Knowledge on different distributions of Hadoop like Cloudera and MapR.
Worked on data streaming tools and ETL tools like Streamsets Data Collector, Attunity Replicate.
Used Agile (SCRUM) methodologies for Software Development.
Worked on HIVE queries to perform data analysis, data transfer and table design to load data into Hadoop environment.
Work closely with business partners to understand business requirements and design solutions based on those requirements.
Extensively involved in writing PL/SQL, stored procedures, functions and packages.
Experience in writing UDFs and configuring CRON Jobs.
Good Knowledge on reporting and data visualization tools like Oracle Data Visualization Desktop, Tableau and Grafana.
Outstanding communication and presentation skills, willing to learn, adapt to new technologies and third-party products.

TECHNICAL SKILLS:

Hadoop/Big data: HDFS, Hive, Map Reduce, Spark, Sqoop, HBase, Kafka, Oozie, Zookeeper

No SQL Databases: HBase, MapR-DB

Languages: Python, Scala, Core Java, Unix Shell scripts, SQL

Web/Application Server: Apache Tomcat

Java Technologies: J2EE, Servlets, JSP, JDBC, Java Beans, Java Script, JMS

Databases: Oracle, DB2, SQL Server, MySQL

IDEs: Eclipse, Intellij, DB Visualizer

Other Tools & packages: CVS, SVN, JUnit, Maven, ANT, GitHub, Streamsets Data Collector, Oracle DVD, Grafana, Tableau.

SDLC Methodology: Agile, Waterfall model

Operating Systems: Linux, UNIX, Windows

Office Tools: MS Office, Word, Power Point

PROFESSIONAL EXPERIENCE:

Confidential - Boston, MA

Data Engineer/Developer.

Responsibilities:

Involved in Requirement analysis, Design, development and testing of the application.
Configured Kafka Connect JDBC with SAP HANA and MapR Streams for both real time streaming and batch process.
Created MapR-Event Streams and Kafka topics.
Worked on Attunity Replicate to load data from SAP ECC to Apache Kafka topics.
Developed Spark Streaming application using Python to stream data from MapR Event Streams and Apache Kafka topics to Hive and MapR-DB and also to stream data from one topic to the other topic with in the MapR Event Streams.
Worked on DStreams (Discretized Stream), RDD’s (Resilient Distributed Dataset), Dataframes, Spark SQL to build the spark streaming application.
Involved in creating SQL queries to extract data, to perform joins on the tables in SAP HANA and MySQL.
Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
Implemented Partitioning, Dynamic Partition, and Bucketing in Hive for efficient data access.
Used Hue and MapR Control System (MCS) to monitor and troubleshoot Spark jobs.
Developed SQOOP scripts to move data from MapR-FS to SAP HANA.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Created stored procedures in MySQL to improve data handling and ETL Transactions.
Worked on data validation using HIVE and also written Hive UDFs.
Configured SAP HANA source connector with SAP HANA as source and Apache Kafka topic as target for real time streaming and batch processing.
Developed Streaming application to stream data from MapR ES to HBase.
Streamed data from Apache Kafka topics to time series database OPEN TSDB.
Built dashboards and visualizations on top of MapR-DB and Hive using Oracle data visualizer desktop. Built real-time visualizations on top of Open TSDB using Grafana.
Worked on UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.

Environment: MapR 6.0, Apache Kafka 1.0.0, Hive 2.1, HBase 1.1.8, Hue, MapR-DB, MapR-FS, Attunity Replicate, Spark 2.1.0, Spark SQL, Python, SAP HANA, SQOOP, Oozie, Pig, SAP ECC, IntelliJ, Kafka Connect Framework, DB Visualizer, Oracle Data Visualizer Desktop, Stream-sets Data collector, MapR-ES, MySQL, GIT.

Confidential - Minneapolis, MN

Data Engineer/Developer

Responsibilities:

Worked on enhancing the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Spark RDD's.
Worked on MySQL for identifying required tables and views to export into HDFS.
Loaded data from MySQL to HDFS to development cluster for validation and cleansing.
Created Apache Kafka topics.
Configured Streamsets data collector with Apache Kafka to stream real time data from different sources (database & files) into Kafka topics.
Developed streaming application to stream data from Kafka topics to Hive using Spark, Python.
Worked on real time processing and batch processing of data sources using Apache Spark, Elastic search, Spark Streaming, Apache Kafka.
Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, python, Spark SQL.
Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance.
Conducted POC's for real time streaming of data from MySQL to Hive and HBase.
Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement.
Handled importing & exporting of large data sets from various data sources into HDFS and vice-versa using Sqoop, performed transformations using Hive and loaded data into HDFS.
Built dashboards and visualizations on top of Hive using Tableau and published those reports on tableau online accounts and on the browser using iframe.
Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages.
Worked on Java Mail API to send text and email notifications to Customers.
Loaded data from UNIX file system to HDFS.

Environment: Cloudera, Apache Kafka, HDFS, Python, Hive, Spark, Spark SQL, PIG, Map Reduce, SQOOP, IntelliJ, Tableau, Stream-sets Data collector, UNIX, MySQL, GIT.

Confidential - San Diego, CA

Data Engineer

Responsibilities:

Involved in testing SQL Scripts for report development and handled the performance issues effectively.
Performed data cleaning depending on the requirement using SQL to ensure data quality and completeness to ensure proper accuracy of data.
Worked extensively with Flume for importing data from various webservers to HDFS.
Developed end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
Monitoring systems and services through Cloudera Manager to make the clusters available for the business.
Worked on Flume to load the log data from multiple sources directly into HDFS.
Developed Sqoop scripts for importing and exporting data into HDFS and Hive.
Used Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group in Kafka.
Worked with HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
Scheduled the workflows using Oozie workflow scheduler.
Worked in Agile and used JIRA for maintain the stories about project.
Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
Created hive queries for extracting data and sending them to clients.

Environment: Apache Kafka, HDFS, Hive, Spark, GIT, MySQL, PIG, Python, DB2, Cron Jobs, UNIX, Spark SQL, SQOOP, OOZIE.

Confidential

Java Developer

Responsibilities:

Involved in the implementation of design using vital phases of the Software development life cycle.
Involved in design, development and testing of the application.
Implemented the object-oriented programming concepts for validating the columns of the import file.
Responsible for creating RESTful Web services using JAX-RS.
Used DOM Parser to parse the xml files.
Implemented complex back-end component to get the count in no time against large size MySQL database (about 4 crore rows) using Java multi-threading.
Experience working in agile development following SCRUM process, Sprint and daily stand-up meetings.
Developed front-end screens using JSP, HTML, JQuery, JavaScript and CSS.
Participate in OOAD, domain modelling, and system architecture.
Used WinSCP to transfer file from local system to other system.
Coming up with the test cases for unit testing before the QA release.
Working closely with QA team and coordinating on fixes.

Environment: Java, Core Java, Apache Tomcat, Maven, JavaScript, RESTful Web Services, Web logic, JBoss, Eclipse IDE, Apache CXF, FTP, HTML, CSS.

Confidential

Java/J2EE Developer

Responsibilities:

Analyzed and modified existing code wherever required and participated in developing the designs document.
Uses Rational Rose for model driven development and UML modelling.
Responsible and active in the analysis, design, implementation and deployment of full software development life-cycle (SDLC) of the project.
Developed the presentation layer using JSP, HTML, and CSS.
Performed form validations using JavaScript and Struts validators.
Used JDBC technology to establish connection with Oracle database Communicated with the database using PL/SQL.
Participated in understanding of business requirements, design and development of the project.
Migrated to a Struts based system from the existing JSP/ Servlets/ Beans based application.
Developed JSP pages and client-side validation by java script tags.
Developed Web services for sending and getting data from different applications using.
Implement Front controller design pattern.
Resolved critical bugs.

Environment: Java, J2EE Servlet, JSF 2, XML, JSON, HTML, CSS, JQuery, Spring 3.0, Log4j, Git, Maven, Eclipse, Apache Tomcat 6, and Oracle 11g.

We provide IT Staff Augmentation Services!

Data Engineer/developer. Resume

Boston, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship