We provide IT Staff Augmentation Services!

Data Engineering Resume

5.00/5 (Submit Your Rating)

Tampa, FL

SUMMARY

  • Proactive IT developer with 4 years of working experience in Java/j2EE Technology and development design of various scalable systems using Hadoop Technologies on various environments.
  • Experienced in installation, configuration, supporting and managing Hadoop Clusters using Horton works, and Cloudera (CDH3, CDH4 ) distributions on Confidential web services (AWS).
  • Extraordinary Understanding of Hadoop building and Hands on involvement with Hadoop segments such as Job Tracker, Task Tracker, Name Node, Data Node and HDFS Framework .
  • Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, Hive, PIG, Sqoop, Flume, MapReduce, Spark, Kafka, HBase, Oozie, Solr and Zookeeper.
  • Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • Extensive knowledge on NoSQL databases like HBase, Cassandra, and Mongo DB.
  • Configured Zookeeper, Cassandra and Flume to the existing Hadoop cluster.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL ( Queries), Pig Latin ( Data flow language ), and custom MapReduce programs in Java .
  • Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness of Python/Java into PigLatin and HQL (HiveQL).
  • Experience in converting Hive queries into Spark transformations using Spark RDDs and Scala .
  • Hands on Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce .
  • Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Confidential Web Services (AWS) - EC2, Open Stack.
  • Planned and created answer for constant information ingestion utilizing Kafka, Storm, Spark spilling and different NoSQ L databases.
  • Developed Scala scripts, UDF 's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop .
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Good hands on experience in creating the RDD' s, DF's for the required input data and performed the data transformations using Spark Scala.
  • Knowledge in developing a Nifi flow prototype for data ingestion in HDFS .
  • Experience in analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatica development.
  • Extensive experience working in Oracle, DB2, SQL Server, PL/SQL and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO .
  • Experienced in designing Web Applications using HTML5, CSS3, JavaScript, JSON, JQuery, AngularJS, Bootstrap and Ajax under Windows operating system.
  • Experience in Service Oriented Architecture using Web Services like SOAP & Restful.
  • Learning on administration situated design (SOA), work processes and web administrations utilizing XML, SOAP, and WSDL
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, JSF, Struts, Spring, Hibernate, JDBC, EJB.
  • Experience in working with Tableau Visualization tool using Tableau Desktop, T ableau Serve r and Tableau Reader.

TECHNICAL SKILLS

Big Data Eco systems: HDFS, MapReduce, Hive, YARN, Pig, Sqoop, Kafka, Storm, Flume, Oozie, and ZooKeeper, Apache Spark, Apache Tez, Impala, Nifi, Apache Solr, Rabbit MQ,Scala.

No SQL Databases: Hbase, Cassandra, mongoDB

Programming Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, Scala, Python

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery, AngularJS

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Intelligence Tools: Tableau, QlikView, Pentaho, IBM Cognos intelligence

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools and IDE: Eclipse, NetBeans,Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer, IntelliJ.

Cloud Technologies: Confidential WebServices(AWS), CDH3, CDH4, CDH5, HortonWorks, Mahout, Microsoft Azure Insight, Confidential RedShift

PROFESSIONAL EXPERIENCE

Confidential, Tampa, FL

Data Engineering

Responsibilities:

  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Design and develop ELT data pipeline using Spark App to fetch data from Legacy system and third-party APIs
  • Integrate the REST APIs with Livy Server in Java and test in Apache Spark Shell and Framework.
  • Design and develop DMA (Disney Movies anywhere) dashboard for BI analyst team.
  • Perform data analytics and load data to Confidential s3/Datalake/ Spark cluster.
  • Involved in querying data using Spark SQL on top of Spark engine.
  • Worked and learned a great deal from AmazonWebServices (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
  • Developed Spark scripts by using Python shell commands as per the requirement.
  • Worked on data analytics using R, Spark and Python using machine-learning libraries.
  • Writing Pig and Hive scripts with UDF in MR and Python to perform ETL on AWS Cloud Services
  • Developed Java UDF's for Date conversions and to generate MD5 checksum value.
  • Used datameer for integration with Hadoop and other sources such as RDBMS (Oracle), SAS, Teradata and Flat files .
  • Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Developed PySpark code to read data from Hive group the fields and generate XML files.
  • Worked with file formats text, avro, parquet and sequence files.
  • Involved in migrating HiveQL into Impala to minimize query response time.
  • Created Hive tables, dynamic partitions, buckets for sampling, and working on them using HQL.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Agile methodology was used for development using XP Practices (TDD, Continuous Integration).
  • Performed Tableau type conversion functions when connected to relational data sources.

Environment: Languages/Technologies: Java, React.js, Azkaban, Spark Sql,PySpark,Presto, Hive, Beeline, Datameer Apache Crunch, Elastic Search, Spring boot,Special Software:Eclipse, GIT Repository, Confidential S3, Confidential AWS Ec2/EMR, Spark cluster, Hadoop Framework, Sqoop,Teradata.

Confidential

Hadoop Developer

Responsibilities:

  • Exported data from DB2 to HDFS using Sqoop and Developed MapReduce jobs using Java API.
  • Involved in managing nodes on Hadoop cluster and monitor Hadoop cluster job performance using Cloudera manager
  • Created Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
  • Developed Spark scripts by using Python shell commands as per the requirement.
  • Designed and implemented Java engine and API to perform direct calls from front-end JavaScript (ExtJS) to server-side Java methods (ExtDirect).
  • Used Spring AOP to implement Distributed declarative transaction throughout the application.
  • Designed and developed Java batch programs in Spring Batch.
  • Installed and configured Pig and wrote Pig Latin scripts.
  • Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Developed workflow-using Oozie for running MapReduce jobs and Hive Queries.
  • Done the work in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Created java operators to process data using DAG streams and load data to HDFS.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Involved in Develop monitoring and performance metrics for Hadoop clusters.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Optimized the load performance and query performance for ETL jobs by tuning the SQL used in Transformations and fine- tuning the database.
  • Defined job flow using Azkaban, scheduler to automate the Hadoop jobs and installed Zookeepers for automatic node failovers.

Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, Spark, My SQL and Ubuntu, Zookeeper, Maven, Jenkins, Java (JDK 1.6), Oracle10g.

Confidential

Java Developer

Responsibilities:

  • Effectively interacted with team members and business users for requirements gathering.
  • Coded front end components using HTML, JavaScript and jQuery, Back End components using Java, spring, Hibernate, Services Oriented components using Restful and SOAP based web services, and Rules based components using JBoss Drools.
  • Implemented client side Interface using React.js and Angualr.js and with use of Underscore JS for JavaScript templating, function binding, creating quick indexes .
  • Involved in analysis, design and implementation phases of the software development lifecycle (SDLC).
  • Used React.JS for templating for faster compilation and developing reusable components.
  • Implementation of spring core J2EE patterns like MVC, Dependency Injection (DI), and Inversion of Control (IOC).
  • Implemented REST Web Services with Jersey API to deal with customer requests.
  • Developed test cases using J Unit and used Log4j as the logging framework.
  • Worked with HQL and Criteria API from retrieving the data elements from database.
  • Developed user interface using HTML, Spring Tags, JavaScript, J Query and CSS.
  • Developed the application using Eclipse IDE and worked under Agile Environment.
  • Design and implementation of front end web pages using CSS, JSP, HTML, java Script Ajax and, Struts
  • Utilized Eclipse IDE as improvement environment to plan, create and convey Spring segments on Web Logic

Environment: Java, J2EE, JDBC,EJB,UML,Swing,HTML, JavaScript, CSS, J Query, Spring 3.0, JNDI, Hibernate 3.0, Java Mail, Web Services, REST, Oracle 10g, J Unit, Log4j, Eclipse, Web logic 10.3.

We'd love your feedback!