We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • Big Data Engineer/Hadoop Developer with around 6+ years of extensive experience in business procedures, design strategies, data analytics solutions development and work flow implementations.
  • Hands on experience on Hadoop and Spark Big Data technologies with experience in Storage, Querying, Processing and analysis of data.
  • Experienced in using various Hadoop tools such as Map Reduce, Hive, Sqoop, Impala, Avro & HDFS .
  • Technologies extensively worked on during my career are Python, Java and various Databases like MySQL, Oracle, Postgre SQL and Microsoft SQL server.
  • Hands on experience working with various Hadoop cluster managers and tools like Cloudera Manager, Apache Ambari, Hue, etc.
  • Experienced in developing programs by using SQL, Python & shell scripts to schedule the processes running on a regular basis.
  • Proficient in working on the Git version control system for code sharing and updating.
  • Experienced in creating ad - hoc reports, summary reports using Advanced Excel, SQL and Tableau .
  • Experienced in collecting logs data from various sources and integration into HDFS using Flume .
  • Experienced in testing data in HDFS and Hive for each transaction of data.
  • Experienced in importing & exporting data using Sqoop from HDFS to Relational Database Systems & vice-versa .
  • Extensive knowledge in programming with Resilient Distributed Datasets (RDDs) .
  • Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS) .
  • Good experience in Shell programming .
  • Knowledge in managing Cloudera's Hadoop platform along with CDH clusters .
  • Proficient in writing complex SQL queries, working with Databases like Oracle, SQL Server, PostgreSQL and MySQL
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP .
  • Experience working with operational data sources and migration of data from traditional databases to Hadoop System.

TECHNICAL SKILLS

Databases: MySQL, SQL Server, Oracle, PostgreSQL

Programming Skills: Python, Pandas, Numpy, NLTK, Scikit-learn, HTML, Java

Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper.

Programming Languages: C, Java, PL/SQL, Pig Latin, Python, HiveQL, Scala and Kafka

Java/J2EE & Web Technologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XHTML, XML, Angular JS, AJAX, JavaScript, JQuery.

Development Tools: Eclipse, Net Beans, SVN, Git, Ant, Maven, SOAP UI, JMX, explorer, XML Spy, QC, QTP, Jira, SQL Developer, QTOAD.

Methodologies: Agile/Scrum, UML, Rational Unified Process and Waterfall.

NoSQL Technologies: Cassandra, MongoDB, HBase.

Frameworks: Struts, Hibernate, And Spring MVC.

Scripting Languages: Unix Shell Scripting, Perl.

Distributed platforms: Horton works, Cloudera, MapR

Databases: Oracle 11g/12C, MySQL, MS-SQL Server, Teradata, IBM DB2

Operating Systems: Windows XP/Vista/7/8,10, UNIX, Linux

Software Package: MS Office 2007/2010/2016.

Web/ Application Servers: Web Logic, WebSphere, Apache Tomcat, WebSphere Application Server

Visualization: Tableau, Qulickview, Microstratergy and MS Excel.

Version control: CVS, SVN, GIT, TFS.

Web Technologies: HTML, XML, CSS, JavaScript, and jQuery, AJAX, AngularJS, SOAP, REST and WSDL.

PROFESSIONAL EXPERIENCE

Confidential, Chicago, IL

Big Data Engineer

Responsibilities:

  • Developed analytical solutions, data strategies, tools and technologies for the marketing platform using the Big Data technologies.
  • Implemented solutions for ingesting data from various sources utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, Sqoop, Hive
  • Worked as a Hadoop consultant on technologies like Map Reduce, Pig, Hive, and Sqoop.
  • Worked with the PySpark API.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Experience working with big data and real time/near real time analytics using the big data platforms like Hadoop and Spark using Python.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Worked in writing Hadoop Jobs for analyzing data like Text format files, sequence files, Parquet files using Hive and Pig.
  • Worked on analyzing Hadoop cluster and different Big Data components including Pig, Hive, Spark, Impala, and Sqoop.
  • Developed Spark code using Python and Spark-SQL for faster testing and data processing.
  • Monitored metrics, created backend reports and dashboard on Tableau.
  • Developed predictive analytics using PySparkAPIs.
  • Involved in working of big data analysis using Pig and Hive.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Used Spark SQL to process the huge amount of structured data.
  • Extracted the data from MySQL and AWS RedShiftinto HDFS using Sqoop.
  • Worked on tools like Flume, Sqoop, Hive and PySpark.
  • Expert in performing business analytical scripts using HiveQL.

Environment: Big Data, Spark, Yarn, Hive, Flume, Pig, Python, Hadoop, AWS, Databases, RedShift.

Confidential, Malvern, PA

Big Data Developer

Responsibilities:

  • Worked as Big Data Developer in the team dealing with Firm's proprietary platform issues, providing data analysis for the team as well as developing enhancements.
  • Involved in working with large sets of big data in dealing with various security logs.
  • All the data was loaded from relational databases to Hdfs using Sqoop and handled the data in the form of flat files from different vendors, text data, xml data, etc.
  • Developed Map Reduce jobs for data cleaning and manipulation.
  • Involved in migration of data from existing RDBMS (MySQL and SQL server) to Hadoop using Sqoop for processing and analyzing the data.
  • Good working knowledge of implementing solutions using AWS services like (EC2, S3, and Redshift).
  • Performed file system management and monitoring on Hadoop log files.
  • Wrote Pig and Hive jobs to extract files from MongoDB through Sqoop and placed in HDFS.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Involved in developing data frames using Spark SQL as needed.
  • Wrote Hive join queries to fetch information from multiple tables and Map Reduce jobs to collect data from Hive.
  • Used Hive to analyze the partitioned & bucketed data and compute various metrics for reporting on the dashboard.
  • Developed the code for importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in configuring and maintaining cluster, and managing & reviewing Hadoop log files.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Analyzed large amounts of data to determine optimal way to aggregate and reported the findings.
  • Explored with the Spark framework, methods for improving the performance and optimization of the existing jobs in Hadoop using Spark Context, Spark-SQL, Data Frames, and YARN.

Environment: MySQL, SQL Server, Python, Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Impala, Flume, PySpark, Spark SQL.

Confidential

Java Developer

Responsibilities:

  • Developed the Web Interface using Struts, Java Script, HTML and CSS.
  • Extensively used the Struts controller component classes for developing the applications.
  • Involved in developing business tier using stateless session bean (acts as a Session Facade) and Message driven beans.
  • Used JDBC and Hibernate to connect to the database, using Oracle.
  • Data sources were configured in the app server and accessed from the DAO’s through Hibernate.
  • Design patterns of Business Delegates, Service Locator and DTO are used for designing the web module of the application.
  • Developed SQL stored procedures and prepared statements for updating and accessing data from database.
  • Involved in developing database specific data access objects (DAO) for Oracle.
  • Used CVS for source code control and JUNIT for unit testing.
  • Used Eclipse to develop entity and session beans.
  • The entire application is deployed in WebSphere Application Server.
  • Followed coding and documentation standards.

Junior SQL Developer

Confidential

Responsibilities:

  • Involved in the life-cycle of the project, i.e., requirements gathering, design, development, testing and maintenance of the database.
  • Created Database Objects like Tables, Stored Procedures, Views, Clustered and Non-Clustered indexes, Triggers, Rules, Defaults, User defined data types and functions.
  • Performed and fine-tunedstored procedures and SQL Queries and User Defined Functions using Execution Plan for better performance.
  • Created and scheduled SQL jobs to run SSIS packages daily, using MS SQL Server Integration Services (SSIS).
  • Performed query optimization and tuning, debugging and maintenance of stored procedures.
  • Database Creation, Assigning Database Security and Standard data modelling techniques.
  • Performed troubleshooting operations on the production servers.
  • Monitored, tuned and analysed database performance and allocated server resources to achieve optimum database performance.
  • Creating Staging Database and Import Tables in MS SQL Server.
  • Loading the data in the systems using Loaderscripts, Cursors, Stored Procedures.
  • Testing the data in Test Environment, client validation, issues resolution.
  • Developing reports on SSRS on SQL Server (2012).

Environment: Java, J2EE, JDK, Java Script, XML, Struts, JSP, Servlets, JDBC, EJB, Hibernate, Web services, JMS, JSF, JUnit, CVS, IBM Web Sphere, Eclipse, Oracle 9i, Linux.

We'd love your feedback!