Big Data Engineer Resume Chicago, IL - Hire IT People

SUMMARY

8+ years of overall software development experience on Big Data Technologies, Hadoop Eco system and SQL Knowledge with programming experience in Python, Scala, and Java.
3+ years of strong hands - on experience on Hadoop Ecosystem including Spark, Map-Reduce, Hive, Pig, HDFS, YARN, HBase, Oozie, Kafka and Sqoop.
Experience in architecting, designing, and building distributed data pipelines.
Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.
Worked with real-time data processing and streaming techniques using Spark streaming and Kafka.
Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
Significant experience writing custom UDF’s in Hive and custom Input Formats in MapReduce.
Knowledge of job workflow scheduling and monitoring tools like Oozie, Airflow.
Experience using various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS EMR) to fully implement and utilize various Hadoop services.
Experience working with NoSQL databases like MongoDB, Cassandra and HBase.
Used Hive extensively for performing various data analytics required by business teams.
Solid experience in working various data formats like Parquet, Orc, Avro, Json etc.,
Good experience is designing and implementing end to end data security and governance within Hadoop Platform using Kerberos.
Hands on experience in developing end to end Spark applications using Spark apis like RDD, Spark Data frame API, Spark MLLib, Spark Streaming and Spark SQL.
Good experience working with various data analytics and big data services in AWS Cloud like EMR, Redshift, S3, Athena, Glue etc.,
Good understanding of Spark ML algorithms such as Classification, Clustering, and Regression.
Experienced in migrating data warehousing workloads into Hadoop based data lakes using MR, Hive, Pig and Sqoop.
Setting up the build and deployment automation for Java base project by using Jenkins.
Developed spark applications in python (Pyspark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
Experience in maintaining an Apache Tomcat MYSQL, LDAP, Web service environment.
Designed ETL workflows on Tableau, Deployed data from various sources to HDFS.
Good experience with use-case development, with Software methodologies like Agile and Waterfall.

TECHNICAL SKILLS

BigData Technologies/ Hadoop Components: HDFS, Hue, MapReduce, Yarn, Sqoop, Pig, Hive, HBase, Oozie, Kafka, Impala, Zookeeper, Flume, Cloudera Manager, Airflow

Spark: Spark SQL, Spark Streaming, Data Frames, YARN, Pair RDD’s

Cloud Services: AWS (S3, EC2, EMR, Lambda, RedShift, Glue), Azure (Azure Data Factory / ETL / ELT / SSIS, Azure Data Lake Storage, Azure Databricks)

Programming Languages: SQL, PySpark, Python, Scala, Java

Databases: Oracle, MySQL, DB2, SQL Server, Teradata

NoSQL Databases: HBase, Cassandra, MongoDB

Web Technologies: HTML, JDBC, Java Script, CSS

Version Control Tools: GitHub, Bitbucket

Server-Side Scripting: UNIX Shell, Power Shell Scripting

IDE: Eclipse, PyCharm, Notepad++, IntelliJ, Visual Studio

Operating Systems: Linux, Unix, Ubuntu, Windows, Cent OS

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential - Chicago, IL

Responsibilities:

Working as a Data Engineer utilizing Big data & Hadoop Ecosystems components for building highly scalable data pipelines.
Worked in Agile development environment and participated in daily scrum and other design related meetings.
Involved in converting Hive/SQL queries into Spark transformations using PySpark.
Worked on Spark SQL, created data frames by loading data from Hive tables and created prep data and stored in AWS S3.
Responsible for loading the customer's data and event logs from Kafka into Redshift through spark streaming.
Developed batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
Optimization of Hive queries using best practices and right parameters and using technologies like Hadoop, YARN, Python, Pyspark.
Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using Pyspark.
Analyzed the SQL scripts and designed it by using Pyspark SQL for faster performance.
Created Sqoop Scripts to import and export customer profile data from RDBMS to S3 buckets.
Built custom Input adapters to migrate click stream data from FTP servers to S3.
Developed various enrichment applications in spark using scala for performing cleansing and enrichment of click stream data with customer profile lookups.
Troubleshooting Spark applications for improved error tolerance and reliability.
Used Spark Data frame and Spark API to implement batch processing of Jobs.
Worked on fine tuning and performance enhancements of various spark applications and hive scripts.
Used various concepts in spark like broadcast variables, caching, dynamic allocation etc to design more scalable spark applications.
Implemented continuous integration and deployment using CI/CD tools like Jenkins, GIT, Maven.

Environment: AWS EMR, S3, Spark, Hive, Sqoop, Scala, Java, MySQL, Oracle DB, Athena, Redshift.

Spark Developer

Confidential - Minneapolis, MN

Responsibilities:

DataIngestioninto the DataLake using Open-source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets.
Expertise in Hive queries created user defined aggregated function worked on advanced optimization techniques and have extensive knowledge on joins.
Create hive scripts to extract, transform, load (ETL) and store the data using Talend.
Developed Sqoop Scripts to extract data from DB2 EDW source databases onto HDFS.
Worked with Oracle and Teradata for data import/export operations from different data marts.
Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
Hands on experience inAzureCloud, stored the data in ADLS - Azure Data Lake Storage,App services,Databricks cluster for running the jobs,Azure SQL Database,Virtual machines,Fabric controller,Azure AD, Azure search, and notification hub.
Designed, configured, and deployed MicrosoftAzurefor a multitude of applications utilizing theAzurestack (Including Compute, Web & Mobile, Blobs, Resource Groups, Azure SQL, Cloud Services, and ARM), focusing on high - availability, fault tolerance, and auto-scaling.
Developed Spark Application by using Python (Pyspark).
Developed and implemented API services using Python in spark.
CreatedPartitions,Bucketsbased on State to further process usingBucketbased Hive joins.
Responsible for continuous monitoring and managing Elastic MapReduce cluster throughAWS console.
ImplementedSparkusing Scala andSparkSQL for faster testing and processing of data.
Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
Worked on migrating MapReduce programs intoSparktransformations usingSparkand Scala.

Environment: Hadoop, Hive, Talend, Map Reduce, Pig, Salesforce, SQOOP, Splunk, CDH5, Python, HDFS, Pig, DB2, Sqoop, Oozie, Putty, Java.

Data Engineer

Confidential - Scottsdale, AZ

Responsibilities:

Build the new universes in Business Objects as per the user requirements by identifying the required tables from Data mart and by defining the universe connections.
Used Business Objects to create reports based on SQL-queries. Generated executive dashboard reports with latest company financial data by business unit and by product.
Performed the data analysis and mapping database normalization, performance tuning, query optimization data extraction, transfer, and loading (ETL) and clean up.
Implemented Teradata RDBMS analysis with Business Objects to develop reports, interactive drill charts, balanced scorecards, and dynamic Dashboards.
Responsible for requirements gathering, status reporting, creating various metrics, projects deliverables.
Developed PL/SQL Procedures, Functions and Packages and used SQL loader to load data into the database.
Designed and developed Informatica Mappings to load data from Source systems. Worked on Informatics Power Center tool - Source Analyzer, Data warehousing designer, Mapping Mapplet Designer and Transformation Designer.
Involved in migrating warehouse database from Oracle 9i to 10g database.
Involved in analyzing and adding new features of Oracle 10g like DBMS SHEDULER create directory, data pump, CONNECT BY ROOT in existing Oracle 9i application.
Tuned Report performance by exploiting the Oracle's new built-in functions and rewriting SQL statements.

Environment: SQL Server, J2EE, UNIX, .Net, MS Project, Oracle, Web Logic, Shell script, JavaScript, HTML, Microsoft Office Suite 2010, Excel

Hadoop Developer

Confidential

Responsibilities:

Involved in Requirements Analysis and design an Object-oriented domain model.
Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Object.
Involved in designing user screens using HTML as per user requirements.
Used Spring-Hibernate integration in the back end to fetch data from Oracle and MYSQL databases.
Used Spring Dependency Injection properties to provide loose-coupling between layers.
Implemented the Web Service client for the login authentication, credit reports and applicant information.
Used Web services (SOAP) for transmission of large blocks of XML data over HTTP.
Implemented the logging mechanism using Log4j framework.
Wrote test cases in JUnit for unit testing of classes.
Developed application to be implemented on Windows XP.
Created application using Eclipse IDE.
Installed Web Logic Server for handling HTTP Request/Response.
Used Subversion for version control and created automated build scripts.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship