We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Dallas, TX

SUMMARY:

  • 6 years of experience in Information technology (IT) industry including 2+ years of hands on experience in Big Data ecosystem technologies such as Hadoop, MapReduce, Spark, Hive, HBase, Sqoop, Kafka, Oozie, Cassandra and Flume
  • Technically skilled at developing new applications on Hadoop according to business needs and converting existing applications to Hadoop environment.
  • Used NIFI for the transformation of data from different components of Big data ecosystem.
  • Used Spark Streaming and Kafka to process real time data.
  • Worked on writing custom UDF’s in Java to extend Hive core functionality
  • Worked on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Worked with RDBMS including MySQL and Oracle SQL
  • Worked with NoSQL databases including HBase, MongoDB and Cassandra
  • Developed simple to complex Map reduce and Streaming jobs using Scala and Java for data cleansing, filtering and data aggregation.
  • Extensive hands on experience in most of the programming languages, Java, Python, Scala
  • Proficient in writing HiveQL and SQL queries to achieve data manipulation
  • Conducted data transformation with data formats like Sequence File, Flat files, XML, JSON, Avro, Parquet and relational tables
  • Strong in core Java including Object - Oriented Design (OOD) and Java components like Collections Framework, Exception handling, I/O system
  • Adept at using Sqoop to migrate data between RDBMS, NoSQL databases and HDFS
  • Developed real-time read/write access to very large datasets via HBase.
  • Consolidated MapReduce jobs by implementing Spark.
  • Experience with Apache Spark with Scala, Python and Java
  • Good knowledge of scheduling batch job workflow using Oozie
  • Familiar with developing environments like JIRA, Agile/Scrum and Waterfall
  • Experience in collecting, aggregating and moving large amounts of streaming data using Flume, Kafka, Spark Streaming.
  • Demonstrated ability to communicate and gather requirements, partner with enterprise architects, business users, analysts and development teams to deliver rapid iterations of complex solutions.
  • Proficient in Data Visualization by creating multiple dashboards using Tableau

TECHNICAL SKILLS:

Hadoop Ecosystem\ Databases: Apache Hadoop 2.5, Hive, Pig, HBase, Sqoop, \ Oracle, MySQL, SQL, MongoDB, Cassandra Spark 1.6, Kafka, Oozie, Zookeeper

Languages\ Visualization: Python, Java, Scala, SQL, R\ Tableau, R

Web Technologies: HTML, CSS

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Big Data Engineer

Responsibilities:

  • Developed data pipeline using Kafka, Sqoop, Hive and Java MapReduce to ingest data into HDFS for analysis.
  • Developed design documents considering all possible approaches and identifying best of them.
  • Aggregated and stored the data result into HDFS and HBase
  • Responsible to manage data coming from different sources
  • Developed business logic using Scala
  • Responsible for collecting incoming data in real-time and processing them with Spark-Streaming and SparkSQL.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
  • Developed scripts and automated data management from end to end and sync up between all the clusters
  • Experienced with Spark Context, Spark SQL, Data Frame, Pair RDD's
  • Developed functional programs in Scala for connecting the streaming data application and gathering web data.
  • Implemented the workflows using Apache Oozie framework to automate tasks
  • Worked in an Agile environment. Effectively communicated with different levels of the management.

Environment: Hadoop 2.5, Hive 1.2, Pig 0.16.0, SPARK 1.6, Scala 2.11.8, MapReduce, HBase 1.1.2, Sqoop 1.4.6, Kafka 0.10.0.1

Confidential, Dallas TX

Big Data Engineer

Responsibilities:

  • To create a production data-lake that can handle transactional processing operations using Hadoop Eco-System.
  • Building data Ingestion layer using Spark and Sqoop in distributed cluster.
  • Data migration from various relational data platforms to Hadoop and building data warehouse on Hadoop ecosystems such as Hive, Oozie and Sqoop.
  • Prepared an ETL pipeline with the help of Sqoop and Hive to be able to frequently bring in data from the source and make it available for consumptions.
  • Configured periodic incremental imports of data from Oracle into HDFS using Sqoop.
  • Extensive experience in working with structured data using Hive QL , join operations, writing custom UDF's and experienced in optimizing Hive Queries.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Developed Spark jobs and Hive jobs to summarize and transform data.
  • Experienced in loading and transforming of large sets of structured data using Spark .
  • Involved in gathering requirements from client and estimating a timeline for developing complex queries using Hive for logistics applications
  • Created Hive tables, loaded data and Hive queries to analyze user request patterns and implement various performance optimization measures including partitions and bucketing in Hive.
  • Setup Oozie workflow for HIVE/Sqoop actions.
  • Involved in designing of HDFS storage to have efficient number of block replicas of data.

Environment: Hive 1.2, Sqoop 1.4.6, Hadoop 2.5, Oozie 4.2.0, Spark 1.6, Oracle, Scala 2.11.8

Confidential

Python Developer

Responsibilities:

  • Implemented scalable applications for information identification, extraction, analysis, retrieval.
  • Directed software design and development while remaining focused on client needs.
  • Collaborated closely with other team members to plan, design and develop robust
  • Interfaced with business analysts, developers and technical support to determine optimal specifications.
  • Evaluated interface between hardware and software.
  • Advised customers regarding maintenance of diverse software systems.

Environment: Ubuntu Linux, Python, OpenCV, Twilio, Raspberry Pie

Confidential

Python Developer

Responsibilities:

  • Designed a dynamic and an interactive website that ensured positive customer experience, resulting in 40% increase in revenue.
  • Developed, tested and debugged software tools.
  • Implemented website functionality using class-based views and models to store data in SQLite database.
  • Developed website using Python and Django Web Framework with the help of HTML template tagging, JS and Bootstrap in front end.
  • Implemented test programs and evaluated existing engineering processes.
  • Designed and configured database and back end application programs.
  • Performed research to explore and identify new technological platforms.
  • Collaborated with internal teams to convert end user feedback into meaningful and improved solutions.
  • Resolved ongoing problems and accurately documented progress of project.

Environment: Ubuntu Linux, Python, Django web framework, HTML, CSS, Bootstrap, JavaScript

Confidential

Junior Java Developer

Responsibilities:

  • Participated in requirements analysis and design of documents.
  • Involved in development of core modules like ticket reservation, payment, user registration and hotel reservation.
  • Developed the application as per the functional requirements from the analysts.
  • Integrated SOAP web services and mapped the responses to display to the user interface.
  • Involved in designing the entire database for the application.
  • Involved in developing persistence layer using JDBC, SQL and stored procedures.
  • Developed presentation tier using JSP, Servlets, HTML, CSS, JavaScript and jQuery.
  • Used JBoss server to deploy the application to the server.
  • Used Subversion (SVN) as version controlling for the source code check in and check outs.
  • Participated in scrum meetings as a part of Agile Methodology.

Environment: JSP, Servlet, Java 1.7, JDBC, HTML, CSS, JavaScript, Eclipse, SOAP, JBOSS

Hire Now