We provide IT Staff Augmentation Services!

Big Data Engineer Resume

0/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • 9+ years of IT experience in Analysis, design, development, implementation, maintenance and support with experience in Big Data, Hadoop Development and Ecosystem Analytics, Development and Design of Java based enterprise applications.
  • Around 5 years of experiences in Hadoop, Eco - system components HDFS, MapReduce (MRV1, YARN), Pig, Hive, HBase, Scoop, Flume, Kafka, Impala, Oozie and Programming in Spark using Scala and exposure to Cassandra.
  • Developed analytical components using Kafka, Scala, Spark SQL and Spark Streaming.
  • Experience in working Cloudera distributions.
  • Experience in using Apache Flume for collecting, aggregating and moving large amounts of data.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
  • Experience in data analysis using Hive, Pig Latin, Impala.
  • Experienced in using Hive and Pig Scripts to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Experience with UNIX shell scripts to process SAS jobs, search strings, executing permissions over directories,etc.
  • ETL / Data Warehouse experience, preferably with SQL Server/TSQL.
  • Experienced with NoSQL databases like Hbase and exposure to Cassandra.
  • Experience in writing custom UDFs in Java for Hive and Pig to extend the functionality.
  • Experience with multiple format files like JSON, Text, XML, Avro, Sequence File and etc.
  • Experience in Data Ingestion, In-Stream data processing, Batch Analytics and Data Persistence Strategy.
  • Having intensive experience in handling structured, semi-structured and unstructured data.
  • Worked on Performance Tuning ofHadoopjobs by applying techniques such as Map Side Joins, Partitioning, Bucketing.
  • Experience working with Java, J2EE, Spring, Hibernate and Web Services(SOAP&REST).
  • Experience in working with web technologies like HTML, CSS, JavaScript and XML.
  • Performed Unit testing with JUnit framework for the applications.
  • Experience working in with Oracle and MySQL databases.
  • Experience in working with SDLC methodologies like Agile and Waterfall.
  • Worked in Windows, Unix/Linux platforms.
  • Strong analytical and problem solving skills.
  • Good inter personnel skills and ability to work as part of a team.
  • Ability to learn and master new technologies.

TECHNICAL SKILLS

Programming Languages and Web technologies: C, C++, Java, J2EE, Scala, Spring, Hibernate, Web services HTML, CSS, JavaScript XML, SQL and PL/SQL

Database management systems: Oracle, My SQL

Operating Systems: Windows XP/2000 and higher, RED HAT LINUX, Cent OS

Big Data Ecosystems: HDFS, Map Reduce, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper, Kafka, HBase, Spark SQL and Spark Streaming

Methodologies: Agile and Water Fall

Tools: and IDEs: SVN, GIT,JIRA, MAVEN, WinSCP, PuTTY and Eclipse

PROFESSIONAL EXPERIENCE

Confidential, Malvern, PA

Sr. Big Data Engineer

RESPONSIBILITIES:

  • Worked with Hadoop Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
  • Developed PIG and Hive UDF's in java for extended use of PIG and Hive.
  • Written Pig Scripts for sorting, joining, filtering and grouping data.
  • Created Hive tables, loaded data and wrote Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Implemented Partitioning, Dynamic Partitioning and Bucketing in Hive.
  • Maintained the configuration files of UNIX servers, AIX, HP-UNIX, Solaris and Linux resulting in servers being security compliant.
  • Extracted data high volume of data sets from SalesForce.com(SFDC) using Informatica ETL mappings/SQL, PL/SQL scripts and loaded to Data Warehouse.
  • Used UNIX commands to create, modify and remove accounts from client servers insuring access is granted to authorized users.
  • Created a Hive aggregator to update the Hive table after running the data profiling job.
  • Issued SQL queries via Impala to process the data stored in HDFS and HBase.
  • Involved in developing Impala scripts for extraction, transformation, loading of data in to data warehouse.
  • Used Sqoop to ingest from DBMS and Python to ingest logs from client data centers. Develop Python and bash scripts for automation.
  • Implemented Map Reduce jobs using Java API and Python using Spark.
  • Exported the analyzed data to the databases such as Teradata, MySQL and Oracle using Sqoopfor visualization and to generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Used Cassandra to store the analyzed and processed data for scalability.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HDFS and to run multiple Hive and Pig jobs.
  • Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
  • Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on these data sets.
  • Used Cassandra to store the analyzed and processeddatafor scalability.
  • Involved in the Big Data requirements review meetings and partnered with business analysts to clarify any specific scenarios.
  • Involved in daily meetings to discuss the development/progress and was active in making meetings more productive.
  • Configured and optimized the Cassandra cluster and developed real-time java based application to work along with the Cassandra database.
  • Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
  • Provided team leadership in areas of specific technical knowledge that is frequently sought out by others with significant experience in a single, complex development environment or across multiple platforms.

Environment: Hadoop, MapReduce, Flume, Impala, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Zookeeper, Cassandra, Teradata, MYSQL, Oracle, Scala, JAVA, UNIX Shell Scripting, AWS.

Confidential, Chicago, IL

Big Data Engineer

Responsibilities:

  • Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations on the fly to build the common learner data model and persists the data in HDFS.
  • Hands on experience in designing, developing, and maintaining software solutions in Hadoop cluster.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Spark Yarn.
  • Experienced with Spark Streaming to ingest data into Spark Engine.
  • Designed the ETL runs performance tracking sheet in different phases of the project and shared with Production team.
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD.
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala.
  • Involved in using SQOOP for importing and exporting data between RDBMS and HDFS.
  • Used Hive to analyze the Partitioned and Bucketed data and compute various metrics for reporting.
  • Involved in developing HiveDDLS to create, alter and drop Hive tables.
  • Involved in loading data from Linux file system to HDFS.
  • Integration of Cassandra with Talend and automation of jobs.
  • Worked on cloud computing infrastructure (e.g. Amazon Web Services EC2) and considerations for scalable, distributed systems
  • Involved in data warehousing and Business Intelligence systems.
  • Responsible for System performance management, Systems change / configuration management and Business requirements management.
  • Primary contributor in designing, coding, testing, debugging, documenting and supporting all types of applications consistent with established specifications and business requirements to deliver business value. Identify and design most efficient and cost effective solution through research and evaluation of alternatives.
  • Demonstrated Hadoop practices and broad knowledge of technical solutions, design patterns, and code for medium/complex applications deployed in Hadoop production.
  • Ingested semi structured data using Flume and transformed it using Pig.

Environment: SPARK, HIVE, PIG, SPARK SQL, SPARK STREAMING, HBASE, SQOOP, KAFKA, CLOUDERA, SCALA IDE(Eclipse), SCALA, MAVEN, HDFS.

Confidential, Cypress, CA

Java/Big Data Engineer

RESPONSIBILITIES:

  • Optimized the mappings using various optimization techniques and also debugged some existing mappings using the Debugger to test and fix the mappings.
  • Worked with developers designing scalable supportable infrastructure.
  • Worked with Linux server admin team in administering the server hardware and operating system
  • Assisted with develop and maintain the system run books.
  • Hadoop development, implementation and Loading from disparate data sets.
  • Translated complex functional and technical requirements into detailed design.
  • Perform analysis of vast data stores and uncover insights and Maintain security and data privacy.
  • Created scalable and high-performance web services for data tracking and done High-speed querying.
  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Monitored multiple Hadoop clusters environments using Ganglia.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Developed PIG scripts for the analysis of semi structured data.
  • Developed and involved in the industry specific UDF (user defined functions).
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Being a part of a POC effort to help build new Hadoop clusters.
  • Test prototypes and oversee handover to operational teams and Propose best practices/standards.
  • Created and published various production metrics including system performance and reliability information to systems owners and management.

Environment: SPARK, HADOOP, HDFS, HIVE, PIG, SPARK SQL, SPARK STREAMING, HBASE, SQOOP, KAFKA, CLOUDERA, SCALA IDE(Eclipse), SCALA, MAVEN, HDFS.

Confidential, Pittsburgh, PA

Java Developer

RESPONSIBILITIES:

  • Develop, test, implement and maintain application software working with established processes
  • Communicate effectively with other engineers and QA
  • Responsible for gathering and analyzing requirements and converting them into technical specifications
  • Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
  • Involved in complete requirement analysis, design, coding and testing phases of the project.
  • Participated in JAD meetings to gather the requirements and understand the End Users System.
  • Developed user interfaces using JSP, HTML, XML and JavaScript.
  • Generated XML Schemas and used XML Beans to parse XML files.
  • Created Stored Procedures & Functions. Used JDBC to process database calls for DB2/AS400 and SQL Server databases.
  • Developed the code which will create XML files and Flat files with the data retrieved from Databases and XML files.
  • Establish, refine and integrate development and test environment tools and software as needed
  • Identify production and non-production application issues
  • Assist in Providing estimates on break/fix, and problem-management activities
  • Identify application performance improvement initiatives and implement the same in consultation with the Service Manager
  • Independent communication with business partners to resolve issues and negotiate over project related items such as priorities and completion deadlines
  • Document suggestions for continuous process improvements for both OM and development activities
  • Provide timely updates on tasks to reporting supervisor.
  • Adherence to the defined delivery process/guidelines (SLA Compliance etc)
  • Adherence to the defined OM delivery process/guidelines (SLA Compliance etc)
  • Assist Service Manager and Service Director in providing management status reports on daily, weekly and monthly basis
  • Ensure that all documentation has been updated and that work product meets UDP / SOX compliance requirements
  • Conduct or facilitate root cause analysis on all in-scope incidents and recommend a corrective action plan.

Environment: JAVA, SQL, ECLIPSE, ORACLE, MySQL, WINDOWS NT, LINUX.

Confidential

Java/Hadoop Developer

RESPONSIBILITIES:

  • Developed high-level design documents, Use case documents, detailed design documents and Unit Test Plan documents and created Use Cases, Class Diagrams and Sequence Diagrams using UML.
  • Extensive involvement in database design, development, coding of stored Procedures, DDL & DML statements, functions and triggers.
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
  • Used spring IOC for creating the beans to be injected at the run time.
  • Modified the existing JSP pages using JSTL.
  • Used spring tool suite (STS) as the IDE for the development.
  • Used jQuery script for client side JavaScript methods.
  • Built a custom cross-platform architecture using Java, Spring Core/MVC, Hibernate through Eclipse IDE.
  • Involved in writing PL/SQL for the stored procedures.
  • Designed UI screens using JSP, Struts tags, HTML, jQuery.
  • Used JavaScript for client side validation.
  • Developed mobile web application using Spring 3.0 MVC, Dependency Injection, Spring JDBC Template for Oracle database operations.
  • Used Log4J for logging.
  • Invoked REST services for fetching application data from various backend systems and show the static content by making service call to database service.
  • Developed mobile android application code by invoking SOAP services for data and integrating mobile web hybrid (HTML/JS) for UI.
  • Involved in the redesign and development of the mobile web application.
  • Used Spring Framework to build MVC architecture and separate Presentation from business logic.
  • Used various design patterns like façade pattern, service delegate, factory pattern, singleton pattern and DAO etc.
  • Involve in support of the existing mobile web application which involved defect fixing and enhancements.
  • Worked on writing JUnit test cases.

Environment: JAVA, WORKBENCH,J2EE, ECLIPSE, HIBERNATE, CVS, JAVASCRIPT, HTML, XML, MAVEN, SQL, ORACLE, WEBSERVICES (SOAP, REST),SPRING.

We'd love your feedback!