Big Data Engineer Resume
Chicago, IL
SUMMARY
- Big Data Engineer/Hadoop Developer with around 6+ years of extensive experience in business procedures, design strategies, data analytics solutions development and work flow implementations.
- Hands on experience on Hadoop and Spark Big Data technologies with experience in Storage, Querying, Processing and analysis of data.
- Experienced in using various Hadoop tools such as Map Reduce, Hive, Sqoop, Impala, Avro & HDFS .
- Technologies extensively worked on during my career are Python, Java and various Databases like MySQL, Oracle, Postgre SQL and Microsoft SQL server.
- Hands on experience working with various Hadoop cluster managers and tools like Cloudera Manager, Apache Ambari, Hue, etc.
- Experienced in developing programs by using SQL, Python & shell scripts to schedule the processes running on a regular basis.
- Proficient in working on the Git version control system for code sharing and updating.
- Experienced in creating ad - hoc reports, summary reports using Advanced Excel, SQL and Tableau .
- Experienced in collecting logs data from various sources and integration into HDFS using Flume .
- Experienced in testing data in HDFS and Hive for each transaction of data.
- Experienced in importing & exporting data using Sqoop from HDFS to Relational Database Systems & vice-versa .
- Extensive knowledge in programming with Resilient Distributed Datasets (RDDs) .
- Experienced in using Flume to transfer log data files to Hadoop Distributed File System (HDFS) .
- Good experience in Shell programming .
- Knowledge in managing Cloudera's Hadoop platform along with CDH clusters .
- Proficient in writing complex SQL queries, working with Databases like Oracle, SQL Server, PostgreSQL and MySQL
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP .
- Experience working with operational data sources and migration of data from traditional databases to Hadoop System.
TECHNICAL SKILLS
Databases: MySQL, SQL Server, Oracle, PostgreSQL
Programming Skills: Python, Pandas, Numpy, NLTK, Scikit-learn, HTML, Java
Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper.
Programming Languages: C, Java, PL/SQL, Pig Latin, Python, HiveQL, Scala and Kafka
Java/J2EE & Web Technologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XHTML, XML, Angular JS, AJAX, JavaScript, JQuery.
Development Tools: Eclipse, Net Beans, SVN, Git, Ant, Maven, SOAP UI, JMX, explorer, XML Spy, QC, QTP, Jira, SQL Developer, QTOAD.
Methodologies: Agile/Scrum, UML, Rational Unified Process and Waterfall.
NoSQL Technologies: Cassandra, MongoDB, HBase.
Frameworks: Struts, Hibernate, And Spring MVC.
Scripting Languages: Unix Shell Scripting, Perl.
Distributed platforms: Horton works, Cloudera, MapR
Databases: Oracle 11g/12C, MySQL, MS-SQL Server, Teradata, IBM DB2
Operating Systems: Windows XP/Vista/7/8,10, UNIX, Linux
Software Package: MS Office 2007/2010/2016.
Web/ Application Servers: Web Logic, WebSphere, Apache Tomcat, WebSphere Application Server
Visualization: Tableau, Qulickview, Microstratergy and MS Excel.
Version control: CVS, SVN, GIT, TFS.
Web Technologies: HTML, XML, CSS, JavaScript, and jQuery, AJAX, AngularJS, SOAP, REST and WSDL.
PROFESSIONAL EXPERIENCE
Confidential, Chicago, IL
Big Data Engineer
Responsibilities:
- Developed analytical solutions, data strategies, tools and technologies for the marketing platform using the Big Data technologies.
- Implemented solutions for ingesting data from various sources utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, Sqoop, Hive
- Worked as a Hadoop consultant on technologies like Map Reduce, Pig, Hive, and Sqoop.
- Worked with the PySpark API.
- Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
- Experience working with big data and real time/near real time analytics using the big data platforms like Hadoop and Spark using Python.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Worked in writing Hadoop Jobs for analyzing data like Text format files, sequence files, Parquet files using Hive and Pig.
- Worked on analyzing Hadoop cluster and different Big Data components including Pig, Hive, Spark, Impala, and Sqoop.
- Developed Spark code using Python and Spark-SQL for faster testing and data processing.
- Monitored metrics, created backend reports and dashboard on Tableau.
- Developed predictive analytics using PySparkAPIs.
- Involved in working of big data analysis using Pig and Hive.
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Used Spark SQL to process the huge amount of structured data.
- Extracted the data from MySQL and AWS RedShiftinto HDFS using Sqoop.
- Worked on tools like Flume, Sqoop, Hive and PySpark.
- Expert in performing business analytical scripts using HiveQL.
Environment: Big Data, Spark, Yarn, Hive, Flume, Pig, Python, Hadoop, AWS, Databases, RedShift.
Confidential, Malvern, PA
Big Data Developer
Responsibilities:
- Worked as Big Data Developer in the team dealing with Firm's proprietary platform issues, providing data analysis for the team as well as developing enhancements.
- Involved in working with large sets of big data in dealing with various security logs.
- All the data was loaded from relational databases to Hdfs using Sqoop and handled the data in the form of flat files from different vendors, text data, xml data, etc.
- Developed Map Reduce jobs for data cleaning and manipulation.
- Involved in migration of data from existing RDBMS (MySQL and SQL server) to Hadoop using Sqoop for processing and analyzing the data.
- Good working knowledge of implementing solutions using AWS services like (EC2, S3, and Redshift).
- Performed file system management and monitoring on Hadoop log files.
- Wrote Pig and Hive jobs to extract files from MongoDB through Sqoop and placed in HDFS.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Involved in developing data frames using Spark SQL as needed.
- Wrote Hive join queries to fetch information from multiple tables and Map Reduce jobs to collect data from Hive.
- Used Hive to analyze the partitioned & bucketed data and compute various metrics for reporting on the dashboard.
- Developed the code for importing and exporting data into HDFS and Hive using Sqoop.
- Involved in configuring and maintaining cluster, and managing & reviewing Hadoop log files.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Analyzed large amounts of data to determine optimal way to aggregate and reported the findings.
- Explored with the Spark framework, methods for improving the performance and optimization of the existing jobs in Hadoop using Spark Context, Spark-SQL, Data Frames, and YARN.
Environment: MySQL, SQL Server, Python, Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Impala, Flume, PySpark, Spark SQL.
Confidential
Java Developer
Responsibilities:
- Developed the Web Interface using Struts, Java Script, HTML and CSS.
- Extensively used the Struts controller component classes for developing the applications.
- Involved in developing business tier using stateless session bean (acts as a Session Facade) and Message driven beans.
- Used JDBC and Hibernate to connect to the database, using Oracle.
- Data sources were configured in the app server and accessed from the DAO’s through Hibernate.
- Design patterns of Business Delegates, Service Locator and DTO are used for designing the web module of the application.
- Developed SQL stored procedures and prepared statements for updating and accessing data from database.
- Involved in developing database specific data access objects (DAO) for Oracle.
- Used CVS for source code control and JUNIT for unit testing.
- Used Eclipse to develop entity and session beans.
- The entire application is deployed in WebSphere Application Server.
- Followed coding and documentation standards.
Junior SQL Developer
Confidential
Responsibilities:
- Involved in the life-cycle of the project, i.e., requirements gathering, design, development, testing and maintenance of the database.
- Created Database Objects like Tables, Stored Procedures, Views, Clustered and Non-Clustered indexes, Triggers, Rules, Defaults, User defined data types and functions.
- Performed and fine-tunedstored procedures and SQL Queries and User Defined Functions using Execution Plan for better performance.
- Created and scheduled SQL jobs to run SSIS packages daily, using MS SQL Server Integration Services (SSIS).
- Performed query optimization and tuning, debugging and maintenance of stored procedures.
- Database Creation, Assigning Database Security and Standard data modelling techniques.
- Performed troubleshooting operations on the production servers.
- Monitored, tuned and analysed database performance and allocated server resources to achieve optimum database performance.
- Creating Staging Database and Import Tables in MS SQL Server.
- Loading the data in the systems using Loaderscripts, Cursors, Stored Procedures.
- Testing the data in Test Environment, client validation, issues resolution.
- Developing reports on SSRS on SQL Server (2012).
Environment: Java, J2EE, JDK, Java Script, XML, Struts, JSP, Servlets, JDBC, EJB, Hibernate, Web services, JMS, JSF, JUnit, CVS, IBM Web Sphere, Eclipse, Oracle 9i, Linux.